Blog

Benchmarking LevelDB

By Andy Twigg

10 Aug 2011
Category: Technical Articles

Google recently open-sourced their LevelDB datastore and published some neat benchmarks. Those benchmarks typically insert at most 1e6 entries, and I wanted to understand how it performs both for larger datasets, say 1e9 (1 billion) entries, and on commodity SSDs.

I modified db_bench.cc to output sampled performance and long-term average performance. I ran the following benchmark on two devices: an Intel X25M SSD (INTEL SSDSA2M160G2GC) and a 7200rpm SATA HDD (SAMSUNG HD642JJ). I created a single ext4 partition on each device. Setting --sample_freq=10000 outputs performance samples (time since start, ops since start, time since last sample, mean ops/s during sample) every 10000 operations. I also played with increasing the cache size, but that didn't have any noticeable effect over the default.


$ sudo ./db_bench --benchmarks=fillrandom --db=device/ --num=1000000000 
--sample_freq=10000 LevelDB: version 1.2 Date: Sun Jul 31 22:28:53 2011 CPU: 4 * Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz CPUCache: 2048 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 1000000000 Sample freq: 10000

SSD

The graph below shows the results from the SSD run. Note the logscale on the y-axis. The insert rate drops rapidly from around 100K/s to just under 10K/s at the end of the test. Note that this is around an order of magnitude slower than the insert rate reported by the benchmark tests that terminate at 1e6 inserts.

The graph below is a zoomed-in portion, showing the effect of compactions on latency. The throughput typically alternates between bursting at around 300K ins/s, and then pausing (to limit the number of "young" arrays). LevelDB uses two cores: one for inserts, and one for compactions.

 

I also recorded the iostat output during the tests, to observe the device traffic. The graphs below show (from top to bottom) device utilisation (%), write rate (MB/s) and read rate (MB/s). The device begins writing around 40mb/s, dropping linearly over time to less than 10MB/s, while the utilisation goes to close to 100%. Eventually, the whole process ground to a halt. Why this happens is peculiar to SSDs - for more information, see our blog posts on how to write to flash SSDs effectively (and why append-only B-trees fail on SSDs).

HDD

I then repeated the test on the SATA HDD, and the results were somewhat similar to the SSD result, albeit a bit slower.

Summary

LevelDB is a lightweight, open-source, portable, easy-to-use key-value store library that, for many cases, might be a better option than a B-tree-based store such as BerkeleyDB or SQLlite. For in-memory performance using a small number (< 1 million) keys, the performance is fairly good on commodity hardware. Considering its intended use with IndexedDB (as a local Chromium store), this seems like a good fit.

Once the dataset is much larger than main memory, or the datastore is under sustained heavy load, a heavyweight alternative might be preferable. At this this point, I should point out the recently open-sourced Acunu Core, which is implemented in-kernel and handles its own buffer caching, prefetching, parallel disk layouts (replacing RAID and LVM), and more. Yes, this is quite heavyweight (you need to insert a kernel module), but it comes with advantages! I hope to publish some benchmarks for this store, but we've managed to solve some very exciting problems, such as sustained high writes to SSDs and fast versioning (via Stratified B-trees).

blog comments powered by Disqus