A skeptic’s guide to analyst reports: ESG and storage QoS

blindspot.jpg

The “Skeptic’s guide to storage analyst reports” is a series of commentaries on a 2015 report by ESG titled “Quantifying the Economic Value of a SolidFire Deployment.” If you haven’t yet, give it a quick read for better context. Catch the other posts in the series here and here. Photo credit: Johari Window and the Blind Spot / Deb Nystrom / CC

There is a tried and true cycle that never fails to repeat itself in technology. It starts with a company, sometimes an established player, but more often a start-up, that sees a challenge, does a gap analysis to look at the possible ways to meet that challenge, and then builds something new and innovative. This innovation, by the nature of how it came about, exists in the blind spot of the industry in general and the vendors in particular.

For examples of this, look at how AWS created the Infrastructure as a Service (IaaS) category when everyone else was still figuring out whether fibre channel mattered; how VMware democratized x86 resource pooling and workload mobility to let operations people sleep at night; and how Apple reinvented the end-user experience and set the bar for what customers should expect from a device interface and ecosystem.

When this innovation happens, there’s an inevitable response from the existing players in the market, and it all centers around the philosophy of dismiss, deceive, delay, and dilute

Dismiss

The first response is always that the market doesn’t need the feature or technology that has been introduced. In the example of AWS, enterprise hardware vendors immediately started to tell customers that customers just weren’t seeing the value. That there weren’t clear use cases or ROI. At this point, chances are there’s been no real investigation into the tech, and the vendor is just trying to figure out how much they need to care.

Deceive

Next, the existing vendors will start to FUD the new innovation in order to deceive customers about its potential and the value it could bring. “It’s too expensive.” “It’s unproven.” “It’s too hard to manage.” “It’s too insecure.” All of those have been official responses to AWS at some point, and the idea is to push customers away from the tech by abusing the trust of customers and by leveraging the power and dollars of a marketing department.

Delay

Once the new innovation is mainstream enough, vendors will delay its adoption, usually by putting it on a formal roadmap, or by releasing a slick tech preview at an industry conference that may or may not be completely smoke and mirrors. This has the effect of delaying that vendor’s customers from possibly investigating the competition by saying “No big deal, we are releasing it soon, ours will be better and you won’t have to change vendors to get it!” Of course, the timeline is always longer than promised, and this delaying tactic may go on for years.

Dilute

In an industry where marketing catch phrases are thrown around constantly, one of the easiest tactics vendors will employ is simply using the same name to mean something completely different, or to represent something far less functional as the equivalent of something new and innovative. Again, the example of AWS is useful. Its core product was a completely API-driven, pay-as-you-go, scale up/down IaaS cloud. The market responded by calling damn near everything “cloud” no matter how far away from the full AWS offering it was. Confusion reigned, customers were confused, and the market for enterprise “cloud” became a morass of disappointment. Even now, many still compare every “cloud” offering to AWS.

The Claim

Quality of Service (QoS) is one of those innovations that has gotten the full gamut of this treatment from storage vendors who looked at their existing architecture and realized they could not easily match the functionality of what SolidFire had brought to market.

Because of this, there’s an incredible number of features called “QoS” that are a good distance away from what was originally brought to market, and the resulting confusion could make a storage buyer look at the ESG report data around QoS benefits with some skepticism.

Specifically, ESG found the following:

“ESG Lab’s cost/benefit analysis indicates that by virtualizing and automating performance, SolidFire could eliminate the need to deal with up to 93% of traditional storage-related problems. These problems might include issues inherent in a traditional architecture that are caused by workload imbalance, monopolization of a fixed set of resources, insufficient resources in a pool, requirements to move VMs, inefficient tiering, and controller bottlenecks.”

Let’s be honest: 93% is an impressive number. It’s understandable that storage buyers reading it might raise an eyebrow. After all, the QoS they are used to, and that may be included in their array or virtualization platform, doesn’t do any of the things that ESG calls out! This is because there are two very different features that are deliberately both called QoS!

Most QoS is designed to protect the array from the applications. It’s designed to make sure an entire array doesn’t get overrun because of a small number of I/O-bound volumes taking up all of the IOPS and starving out the others, creating an array-wide issue. The problem being solved is usually called “noisy neighbor,” and it uses some combination of maximum limits on IOPS per volume, prioritization, and performance-level tagging to keep the array running even when a volume is misbehaving.

Of note about this type of QoS is that you have to have lots and lots of additional capacity, because while the array may be able to stop one aberrant volume from cascading across the array, it can’t do anything to ensure the performance of any other volume other than best effort.

That’s why we wouldn’t consider it to be genuine QoS. Genuine QoS equates to actual guaranteed performance, brought about by three primary components:

  1. Ability to set a maximum IOPS value per volume/object
  2. Ability to guarantee a minimum number of IOPS per volume/object
  3. Ability to set QoS dynamically and programmatically

To these base requirements, SolidFire adds the concept of “bursting” so that volumes that are normally well behaved have some room to push above their maximums, so long as that doesn’t impair the ability for every other volume on the cluster to get its guaranteed minimum.

So what? How do these capabilities, even if implemented properly, impact the day to day operational reality of customers? That’s the question ESG set out to answer. The key turns out to be the unique ability of SolidFire to combine QoS with a scale-out architecture in order to drive massive amounts of consolidation, and to be able to better protect the performance of the workloads after that consolidation than a traditional array could before.

We can truly guarantee performance in a scaling scenario.

Consolidation does so many good things:

  • Fewer capital costs, both upfront and over time. No over-provisioning, no multiplying array overhead across multiple arrays.
  • Lower data center costs. Far less space and power consumed, less cooling needed, better utilization of existing space due to 1U form factor. Less cabling, fewer switches, no requirement for expensive fibre channel or Infiniband equipment.
  • Lower operational costs. No more different GUIs for arrays that are dedicated to different workloads. No more differentiated automation and orchestration workflows. No more stranding capacity or performance on one array when it’s needed elsewhere. No more moving data from one tier to another in order to try and improve performance. No more forklift controller upgrades.
  • More ability to take advantage of existing and future data services. The ability to use metadata to drive financial and operational value into storage is quickly becoming a huge differentiator in the market place. The challenge is that those services can have a steep cost in CPU and RAM to make them effective and run them in an always-on fashion. A scale-out architecture has the added benefit: every time a cluster is expanded to add more compute and performance, CPU and RAM are also added.

Taken together, there’s a huge amount of cost enterprises assume are unavoidable that can be avoided or minimized.

Is 93% a valid number?

I’m confident it is based on the testing that ESG did, but every customer environment is going to be different. Are you currently managing dozens of arrays of different types supporting different workloads? Then you are going to see a huge savings. Are you standardized on a single type of array and tend to consolidate as much as you can? Then it’ll be lower. But the key takeaway is that by using a scale out architecture and true, complete, native QoS mechanism, enterprises can maximize the amount of utilization and efficiency and minimize unnecessary costs.

How could real QoS help your organization? I bet the answer will surprise you.

 

 

2266 views 2 views Today
2266 views

 | 0 comments



Posted in Next Generation Data Center, Quality of Service.