How to rebuild 2TB disks in 30mins
By Andrew Byde
3 Dec 2011
Category:
Technical Articles
One of the advantages of the Acunu Data Platform over any database relying on RAID for its underlying data redundancy, is disk rebuild speed. ADP brings a new alternative to RAID which is much faster. Here's a graph showing how long it takes from the moment a disk fails to the moment that data is once again protected, showing that with ADP, your disk rebuild will be up to 5 times faster.

And now for the all-important small-print!
We compared a vanilla Cassandra with Linux md RAID to a pre-release v2 of our product, which uses a layout known as Randomised Duplicate Allocation (RDA). In the 2-RDA mode, each block of data is duplicated, and the 2 copies allocated at random among the available devices (other schemes can use more than 2 copies, or a variable number of copies depending on the popularity of the data and space constraints).
For RAID we rebuild to a new hot-spared disk (i.e. from 7 to 8 disks in the case of an 8 disk test). Acunu uses a distributed hot-spare model: when a disk fails, the blocks that were on it are duplicated elsewhere within the remaining disks, so that once the process is finished all blocks are again on 2 disks. In either case, the rebuild time reported is the window during which a second disk failure will cause data loss.
| Redundancy | RAID-10 | RAID-5 | 2-RDA | 2-RDA | 2-RDA |
|---|---|---|---|---|---|
| Disks | 8 | 8 | 8 | 16 | 16 |
| Disk size /TB | 1 | 1 | 1 | 2 | 2 |
| Total Utilisation | 25% | 44% | 50% | 50% | 25% |
| Total data / TB | 2 | 4 | 4 | 16 | 8 |
| Exposure /hrs | 4:03 | 3:38 | 0:48 | 0:44 | 0:22 |
Vanilla Cassandra at 100% capacity using RAID-10 would use 25% of the disk space for unique data -- 50% lost through duplication and 50% of what's left lost through Cassandra's requirement to keep disks half full at most. Likewise, vanilla Cassandra using RAID-5 on 8 disks uses 44% of disk space for unique data.
Acunu Cassandra at 100% capacity (using 2-RDA) would use 50% of the disk space for unique data -- 50% lost through duplication, but only trivial additional losses for merge overheads, since we do in-place merges.
Utilisation is an important consideration for RDA because unlike RAID we only rebuild areas of disk that are used for storing data. And in contrast to RAID, the data that needs to be duplicated can be distributed to all remaining disks -- thus is not limited by the bandwidth of a single device. As a consequence, while RAID rebuild times are proportional to the size of the failed device, RDA rebuilds are proportional to the amount of data on the failed device, and inversely proportional to the number of devices that remain. This is why rebuild time is roughly the same for columns 3 and 4 above -- the amount of data per device has doubled, but so has the number of devices -- and smaller for the fifth column, in which we examine rebuild time for a node at only 50% capacity.