Challenges of Block Storage as a Service at Cloud Scale Part 1 – Performance
For service providers who want to offer Block Storage as a Service as part of their cloud compute offering, a number of challenges exist. At SolidFire, we’re focused on solving the biggest problems that the service providers encounter when trying to build scalable, reliable, and profitable network-based primary storage. In this first of a three part blog series discussing these problems we will address the challenge of performance.
Over the past 20 years, a huge performance imbalance has been created between processing power, which has doubled every 18-24 months under Moore’s law, and storage performance, which has barely improved at all due to the physical limitations of spinning disk. Meanwhile, storage capacity has exploded by a factor of 10,000 over that time. The result is that while capacity is plentiful and cheap, storage performance (measured in IOPS) is expensive.
For a service provider looking to sell block-based primary storage as a service, that imbalance makes it difficult to sell storage on a per-gigabyte model, which is how it is most commonly sold today. Customers who may buy only 50 or 60 GB of space for their application still expect reasonable performance – but when that customer is put on the same set of disks with dozens of others, their “fair share” of IOPS doesn’t amount to very much. Even worse, performance will vary considerably based on how many other apps are on the same disk and how active those apps are at any given time. The result is poor, unpredictable performance, and unhappy customers. Today, service providers offering Block Storage via enterprise storage arrays typically deal with this challenge by using lots of fast, expensive FC and SAS disk, and utilizing only a fraction of the available capacity (a technique known as under-provisioning or short-stroking). Even with this approach, it’s difficult or impossible for providers to guarantee customers any particular level of performance, short of putting them on their own dedicated disks and eliminating much of the benefit of an efficient multi-tenant cloud.
So what about flash? Doesn’t it solve the performance problem? Today that’s true only in part. As we previously discussed, most enterprise storage today makes only limited use of flash as a cache or tier of storage for hot data, and overall array performance is often limited by the controller. While cache and tiering technology does a good job of “globally optimizing” array performance by putting the hottest data in low latency flash, it can actually end up causing more headaches for service providers by making storage performance even more unpredictable for customers. From the perspective of an individual customer, their data may be blazing fast one minute as it is served from flash, and the next minute slow as a dog as it got bumped to disk, because another customer was “hotter.” At this point, expect a support call. Inconsistent, unexplainable performance is one of the biggest complaints about block storage in the cloud today, and automated tiering and cache just makes it worse. All service providers want is an endless amount of storage performance that can be carved up and sold in predictable, profitable chunks. Is that too much to ask?
-Dave Wright, Founder & CEO