The SolidFire Blog
Storage Notes for the Next Generation Data Center

Solving for Storage in your Virtual MongoDB Environment

twittergoogle_pluslinkedinmail

Since first rolling out MongoDB into production in 2010, I’ve architected many different types of storage configurations, from bare metal to private cloud, to public cloud (and back again), SAN-backed, SSD and HDD. Throughout these experiences the whys and hows of scaling “the storage problem” have been central to my technical focus. High iowait is the bane of any database system, and MongoDB is no exception.

A critical (and often overlooked) component of large MongoDB systems is storage performance. Maintaining throughput at scale requires a data storage subsystem with sufficiently low latency and guaranteed per-volume IOPS so that no single node becomes a bottleneck for the aggregate throughput of the cluster, nor does replication lag rear its ugly head under cyclical high load scenarios.

IOPS starvation can result in two main degradation modes: slow page access for any given Mongo instance, and overall shard balancing breakdown.

The first case, that of slow page access, can induce sluggish response from an affected Primary, or induce replication lag if the affected node is a secondary member. The case of sharding breakdown is more insidious, and can bring a cluster to its knees. It’s generally recommended that 25 percent IOPS headroom is left available for use by the shard balancing processes, which can require that systems be over-engineered and storage capacity wasted.

These challenges traditional storage systems encounter in MongoDB environments could explain why the existing YCSB benchmark results in the wild for MongoDB were much lower than the performance we were seeing in the lab using SolidFire.

YCSB (Yahoo Cloud Services Benchmark) is the current de facto tool for benchmarking NoSQL systems, including MongoDB. YCSB was developed by the Yahoo Labs team to allow the comparison of various NoSQL offerings against a repeatable set of workloads with a known dataset.

To try and reconcile the discrepancy between the lab and benchmark results, and to help our customers better understand what they should expect from a MongoDB solution running on top of SolidFire storage, we decided to perform our own testing using the popular YCSB benchmark.

YCSB Test Environment Diagram. The test platform was composed of 6 virtual servers: a 4 node sharded MongoDB database test cluster, 1 MongoDB config server, and a single YCSB client server, which was also used as the mongos query router. A common pattern of 4-vCPUs with 16GB of RAM was used for each server, with the exception of the client/mongos server which was allocated 8-vCPUs to ensure no bottlenecks from the data driver (a common testing methodology flaw).

YCSB Test Environment Diagram. The test platform was composed of 6 virtual servers: a 4 node sharded MongoDB database test cluster, 1 MongoDB config server, and a single YCSB client server, which was also used as the mongos query router. A common pattern of 4-vCPUs with 16GB of RAM was used for each server, with the exception of the client/mongos server which was allocated 8-vCPUs to ensure no bottlenecks from the data driver (a common testing methodology flaw).

In order to get as close to apples-to-apples as possible, we wanted to run a test environment that was similar to those used in existing published YCSB tests for MongoDB. Many of these were performed in EC2 using striped regular EBS or EBS pIOPS (provisioned IOPS) storage volumes. Since SolidFire shared storage isn’t an available option in EC2, we created a virtualized environment using VMware hosts with low resource settings to replicate the x.large instances used in many of the YCSB MongoDB tests we were referencing.

We set up the MongoDB test cluster with documented best practice tunings, including those for MongoDB on SolidFire (available here in the SF MongoDB Config Guide). For storage, we used a standard SolidFire 3010 scale-out storage array, the same unit that was used in our MongoDB Enterprise Certification validation.

The environment setup for the test was as follows:

  • MongoDB 2.6 in 4-node sharded cluster configuration, 1 mongos/1 cfgserver (shared)
  • VMware environment with 4 vCPUs each, 8GB (8 vCPUs for the mongos/cfgserver)
  • CentOS 6.5, config mods: [open files, hdparm, read ahead, scheduler]
  • Data volumes: SolidFire iSCSI via 10G network, 1TB per node, 500GB for cfg server
  • Filesystem: xfs mounted with nobarrier, noatime, nodiratime

As we ran through the standard default workloads, the results were in line with what we’d expect to see from a database service running on a SolidFire backend. However, when we began to compare those results (both latency and ops/sec) to the YCSB benchmark results available in the wild, some very interesting difference began to leap out:

  • Overall ops/sec throughput was exceptionally high compared to all other published tests
  • Request latency was minimal compared to existing benchmark results
  • 200% higher ops/sec than any published YCSB/MongoDB result for the 50/50 read-update workload
  • 300% higher ops/sec than any published YCSB/MongoDB result for the read-only workload

These proof points validate the significant benefits that can be realized from deploying a next-gen shared storage infrastructure when running virtualized MongoDB in any type of public or private cloud environment. To see the full results and learn more about the testing methodology for the MongoDB and SolidFire YCSB benchmark, download our whitepaper.

- Chris Merz, Database Systems Architect

 

twittergoogle_pluslinkedinmail
Chris Merz

About Chris Merz

A seasoned Internet services database veteran, Chris Merz is the Chief Database Systems Engineer at SolidFire. Chris develops benchmarking methodologies and database system certifications for SolidFire’s next-gen storage products.

2 Comments

  1. Perry Harrington
    Posted August 1, 2014 at 2:20 pm | Permalink

    Why did you use VMWare for this test instead of XEN? EC2 uses the XEN hypervisor. I’ve found that VMWare generally has higher I/O marks when properly provisioned, but XEN handles CPU load better where VMWare tries to steal CPU cycles in an effort to over-subscribe the CPU and memory resources. From my experience databases run more consistently under XEN than VMWare because of the greedy nature of VMWare and simplistic auto-affinity of XEN.

    • Chris Merz Chris Merz
      Posted August 4, 2014 at 9:01 am | Permalink

      Greetings, Perry! Simply put, VMware is what we had set up in the lab. We do use Xen in other areas of our system, but not currently in the database lab environment. To ensure VM stability in this test, I specifically reserved CPU and RAM to eliminate any variables and isolate storage i/o throughput. I would certainly not argue that Xen is inferior to VMware, as I subscribe to a ‘right tool for the job’ philosophy, and that generally includes accommodations around environment choices (such as hypervisors), in a customer’s extant ecosystem. I think running future versions of this test on Xen would be a fine idea.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*