Blog

CQL benchmarking

By Eric Evans

12 Dec 2011
Category: Technical Articles

The CQL Value Proposition

CQL (Cassandra Query Language) is the relatively new SQL-like query language for Apache Cassandra. It's meant as a user-friendly alternative to Cassandra's Thrift-based RPC interface, which, frankly, sucks. The case for CQL is that of greater environmental stability and ease-of-use, two areas that Cassandra has rightly been criticized for in the past. To demonstrate the latter of these, look at the code (in Java) for writing a single column using the RPC interface:

  Column col = new Column(ByteBuffer.wrap(“name”.getBytes()));
  col.setValue(ByteBuffer.wrap(“value”.getBytes()));
  col.setTimestamp(System.currentTimeMillis());

  ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
  cosc.setColumn(col);

  Mutation mutation = new Mutation();
  Mutation.setColumnOrSuperColumn(cosc);   
   
  List<Mutation> mutations = new ArrayList<Mutation>();   
  mutations.add(mutation);    

  Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); 
  Map cf_map = new HashMap<String,List<Mutation>>(); 
  cf_map.set(“Standard1”, mutations); 
  mutations.put(ByteBuffer.wrap(“key”.getBytes()), cf_map);

In contrast, the corresponding query in CQL looks like this:

  INSERT INTO Standard1 (KEY, name) VALUES (key, value) 

CQL is the obvious winner here from a usability standpoint, and it may seem obvious to some, but the reason is that it adds a lot of server-side abstraction. It may help to think of the query as a graph.

Query graph

In the case of the RPC interface (the first code snippet above), the application developer is being asked to construct the query graph manually, in a form that is directly consumable by Cassandra. The same query graph is represented in the CQL query, only in a human-readable text format, which is then parsed by Cassandra to create the structures it needs to process the request. One of the more frequent suppositions regarding CQL is that because of this query string parsing, it ''must'' perform poorly compared to the Thrift RPC. I've frequently cautioned people against jumping to such conclusions, and pointed out that it's not a strict game of performance figures. Developer time being the most valuable of project resources, it's only imporant that it performs ''well enough''.

Benchmarking CQL against Thrift RPC

Performance testing in Cassandra is typically done with the stress utility (located in tools/). I recently found some time to extend the stress utility for CQL, and now have some concrete results.

Test #1: Insert 20M rows x 5 columns

When run against the RPC interface, an insert is performed using a batch_mutate() call with one row each. For CQL, this translates to a single UPDATE statement. These tests stuck mostly to the defaults, so column names are ascii and 2 characters long, keys and values are binary of 7 and 34 bytes in length respectively.

Insert 20M rows x 5 columns (no index)


Average OP RateAverage Latency
RPC 20,953/s 1.6 ms
CQL 19,176/s (-8%) 1.7 ms (+9%)

Test #2: Insert 10M rows x 5 columns with KEYS-type secondary index

This test is identical to the previous one with one exception, there is an index on one of the five test columns. The gap in the results is a bit closer here (-6% versus -8%), I believe this is because updating the index has the node more I/O bound for this test.

Insert 10M rows x 5 columns (KEYS index)


Average OP RateAverage Latency
RPC 9,850/s 5.3 ms
CQL 9,290/s (-6%) 5.5 ms (+4%)

Test #3: Counter increments for 10M rows x 5 columns

Like the insert tests, the stress tool performs a batch_mutate() of one row each when benching RPC, and for CQL uses an UPDATE statement with the <name> = <name> + 1 syntax.

Counter increment, 10M rows x 5 columns


Average OP RateAverage Latency
RPC 18,052/s 1.7 ms
CQL 17,635/s (-2%) 1.7 ms

Test #4: Read 20M rows x 5 columns

For RPC, this test translates to a get_slice() with open-ended start and end columns, limited to 5 results. For CQL, stress performs a SELECT in the form of SELECT FIRST 5 ".." FROM....

Read 20M rows x 5 columns


Average OP RateAverage Latency
RPC 22,726/s 2.0 ms
CQL 20,272/s (-11%) 2.3 ms (+10%)

Taking a closer look

First off, there are a few things which may have influenced these tests (against CQL), for example:

  • stress was written specifically with the RPC interface in mind and so makes the assumption that column names and values will always need to be converted to bytes. This means that the CQL tests are enduring unnecessary round-trips from String, to ByteBuffer, and back to String (which is actually quite expensive).
  • Query compression wasn't implemented in stress, and might have some effect on the larger queries.
  • Client and server were run on the same machine. Since CQL query parsing tends to push CPU usage a bit higher, a truer test would have the benchmark running on a different host. That said, it isn't difficult to find some CQL-specific hot-spots when running Cassandra in a profiler.

Term parsing

A term here means things like keys and column names and values. Imagine a sample query from Test #1 above:

 UPDATE Standard1 USING CONSISTENCY ONE SET
    C1=d41d8cd98f00b204e9800998ecf8427ed41d8cd98f00b204e9800998ecf8427efd34,
    C2=03c7c0ace395d801803c7c0ace395d80182db07ae2c30f0342db07ae2c30f03407ae,
    C3=3691308f2a4c2f6983f2880d32369133691308f2a4c2f6983f2880d32e29c8408f2a,
    C4=9f6e6800cfae7749eb6c49f6e689f6e6800cfae7749eb6c486619254b9c00cfae774,
    C5=8f60c8102d29fcd52518f60c8102d298f60c8102d29fcd525162d02eed4566bfcd51
    WHERE KEY=00000000000001

Column names are 2-character wide ASCII types. Column values are 34 byte binary, which for CQL means they have been hex-encoded. The key is also binary, so it is hex-encoded as well. Of the 11 terms that need to be parsed in the above statement, you might think the bulk of the time was spent in parsing the longish binary values from hex-encoded string, but surprisingly its the 5 ASCII column names (C1, C2, etc) that are most expensive. The time spent is attributable to Java's String.getBytes(Charset). I don't know what to make of that.

Copying and conversion

There is also a not inconsiderable amount of time that is spent in copying and conversion to and from bytes. One example that I found quite surprising is the conversion of the query from ByteBuffer to String upon receiving the request. Despite this being a once-per-query event, it ranks relatively high. This could turn out to be important, since the only reason the query argument is binary (as opposed to a UTF8 string) is to support compression. It remains to be seen what the benefits of compression are, and it's possible this conversion could offset some or all of them. One way or another, this will likely boil down to an argument in favor of a custom wire protocol.

So is it worth it?

Personally, I'm pretty pleased with these results. It's easy to fall into the trap of making it all about the performance numbers, but the difference here is small enough that it hardly seems like reasonable justification for choosing one interface over the other. Worst-case, this is a requirement of an additional node in a medium-sized cluster, and that can't possibly be more costly than the developer time you save from using CQL. Also keep in mind that for these tests, the node is parsing the same query string for each and every request. Prepared statements will allow us to parse a statement just once, and send along the columns and keys for subsequent requests. If there were no other optimizations available, this one alone would surely be enough.

blog comments powered by Disqus