Sorting through the noise (and the bottlenecks)

The current flash-based storage landscape is filled with many vendors proposing to address different niches of the market with their respective solutions. With flash as the common ground, some of the more easily identifiable differentiators are in areas like host interface, form factor, media support and data protection schemes. The design choices for these specifications are heavily influenced by each vendors’ target workload and/or customer set. Of course, there are strengths and weaknesses to every approach. There are bottlenecks to be minimized or altogether avoided if possible. If all goes according to plan a vendor’s target market will play to more of its strengths than weaknesses.

At SolidFire we have taken direct aim at solving the challenges encountered in delivering high performance storage for large-scale multi-tenant cloud environments. For this customer set the objective is not about delivering massive amounts of performance to single application at any cost. Instead, these providers are focused on cost effectively delivering consistent performance to thousands of applications at the same time. This use case has shaped many of our early design choices at SolidFire. We believe the most efficient way to achieve the right price/performance balance at scale is through a shared storage architecture.

In the case of shared storage, regardless of how fast the storage system can deliver I/O, there will always be the issue of network latency. Fusion-io has eliminated the network latency issue altogether with its server resident PCI-based designs. This design works well for DAS topologies serving massive IOPS to extremely performance hungry applications. However for the service provider use case referenced above, the price/performance and availability story of server-resident flash misses the mark.

So if network latency is unavoidable, what is the best approach? How do you optimize the storage stack to maximize IOPS and minimize latency to deliver consistent performance to thousands of applications? Sparing you a buzzword infused tongue twister that distills our approach into as few words as possible (think “Raid-less All-SSD Scale-Out Storage System”), we have instead outlined some of the key enabling features of our design in a more digestible format below;

  • An All-SSD system is the only way to confidently deliver predictable performance across a large number of tenants and applications in a large-scale cloud infrastructure. A tiered approach may suffice in a controlled setting with a few applications. However, the resource intensity and performance variability encountered in larger QoS-sensitive environments make tiering an unsustainable option.
  • Scale-out can mean lots of different things. For SolidFire this means no monolithic storage controllers. It also means a fully distributed design with IO and capacity load evenly balanced across every node in the cluster. At the media layer, data still has to traverse the SAS bus, but ten drives per node are working in tandem to deliver more than enough aggregate performance. Thinking through alternative design choices here, it is important not to lose sight of the fact that any latency encountered at this layer of the stack is an order of magnitude less than what is encountered at the network layer.
  • RAID-less means exactly what you think, no RAID. More than any controller bottlenecks, RAID is the biggest performance drag in the storage stack. By rethinking the date protection algorithm you cure a lot of what ails storage system performance today. At SolidFire we have done just that, implementing a replication-based redundancy algorithm where data is distributed throughout the cluster. The result is significant improvements in write performance and drastic acceleration of rebuilds from failure without performance impact.

Sure our storage system does a heck of a lot more than these three things. You can read all about the software innovations embedded in our Element OS on our site. But these three concepts we highlight above are critically important design choices that we made early on. They are foundational components of our architecture that make the rest of the story possible. They are also three fairly tangible concepts to help you differentiate one vendor from the next in the flash-based storage market. Good luck, it’s noisy out there!

-Dave Cahill, Director of Strategic Alliances

3028 views 6 views Today


Posted in Other.