Hypervisor-based QoS: Helps with the symptoms, but by itself it’s not the cure

If you have been following our recent stream of blogs and announcements, we have been giving a lot of airtime to the subject of storage Quality of Service (QoS). In a timely post on this subject, VMware’s @FrankDenneman recently wrote a blog to solicit feedback on a concept they are calling “Storage-level Reservations.” If you haven’t read the blog yet, I would encourage you to do so. Also make sure to fill out the survey at the end to help VMware with their research.

In the post Frank summarizes the key challenges imposed by running multiple tenants on a shared storage infrastructure:
In a relatively closed environment such as the compute layer it’s fairly easy to guarantee a minimum level of resource availability, but when it comes to a shared storage platform new challenges arise. The hypervisor owns the computes resource and distributes it to the virtual machine it’s hosting. In a shared storage environment we are dealing with multiple layers of infrastructure, each susceptible to congestion and contention. And then there is the possibility of multiple external storage resource consumers such as non-virtualized workloads using the same array impacting the availability of resources and the control of distributing the resources.

        – Frank Denneman, Would you be interested in Storage-level reservations? 3/26/13
While it would be fantastic to solve this problem solely from a hypervisor perspective, the reality is that the hypervisor has very little control or visibility of the underlying storage system resources. A cloud infrastructure of any size demands a more coordinated approach across both host and storage resources. Some of the key issues to consider with a hypervisor-centric approach in front of traditional storage include:

  • Lack of IOPS control. While the hypervisor can throttle IOPS, it has no control over maintaining the total IOP pool available. With not governance from the underlying storage system there is no way for a hypervisor to truly guarantee a minimum IOP level. In this scenario the hypervisor will always be at the mercy of the storage device.
  • Performance degradation. Without visibility into back-end storage resource utilization, there is no way for the hypervisor to know what resources remain available to it on a persistent basis. As storage system utilization increases performance degradation becomes a real concern. With a larger pool of virtual resources contending for the same pool of resources, the lack of any sort of storage system layer isolation effectively creates an IOPS free-for-all. The resulting performance variability is a non-starter for a multi-tenant infrastructure hosting performance sensitive applications.
  • Forced overprovisioning. Absent the ability to granularly carve up storage system performance and provision it out to each virtual machine, the only way to ensure a large enough IOPS pool for these VMs is to extensively overprovision your storage. Unfortunately, there is no better way to blow the economics of your shared storage environment than by being forced to deploy 3x as many systems at 1/3rd the utilization rates.
  • Lacking coordination. While throttling IOP usage to VMs is a basic form of storage QoS, this solution is more of an indictment of the deficiencies of existing storage systems than an ideal solution to the problems posed in a multi-tenant infrastructure.  True QoS is delivered through end-to-end coordination and orchestration between the host and the underlying storage system to ensure each virtual machine has the resources it needs to properly support the application.

Implementing a storage QoS mechanism like storage reservations at the hypervisor layer, without similar enforcement capability at the storage system level, does little to address the core challenges imposed by these multi-tenant environments. With VMware and others efforting to improve controls at the hypervisor layer, now is the time to demand more from your storage vendors to deliver on their side of this equation. The good news is there is no need to wait. There are options already available today and over time, API-based integration between hypervisors and storage systems, such as that provided by projects like OpenStack Cinder and VMware VVOLs will provide a much more holistic approach to managing storage Quality of Service than what can be obtained from a hypervisor alone.

-Dave Cahill, Director of Strategic Alliances

5990 views 6 views Today


Posted in Quality of Service.