JBOD vs RAID

TL;DR: JBOD is ~30% faster than MD RAID0. ext2/xfs/ext4 perform similarly.

While sitting around waiting for a cobbler sync of Fedora for portacluster, I decided to dig into an issue I've been curious about for a while. Which is faster? RAID0 or JBOD. My guess was that JBOD would win since each drive would operate independently, with separate queues, separate filesystems, etc.. A friend who works at IO Switch came by with a loaner 4x 256GB I/O Switch Raijin PCIe SSD, which is basically perfect for this test. Each 256GB SSD on the card shows up as its own AHCI controller and subsequent drive letter. Every test is being run on the exact same drives, in the same machine, no hardware changes between tests.

An important trend I noticed is that the highest-performing tests always capped out at almost exactly 500MB/s (aggregate r/w). This is happening because my motherboard will only assign a maximum of 4 lanes to a slot. Even though I have the card in the 16x slot, it can't get more than 4 lanes of PCIe 2.0.

fio Configs

All tests were run with one of these 3 variants, which are roughly equivalent. The RAID test runs 4 processes against a single filesystem, while JBOD runs one process per filesystem for a total of 4. Finally the ZFS variant uses mmap since ZFS does not support direct or AIO on Linux.

The configuration is designed primarily to test peak IOPS on a device given a random-access workload of 50% reads and 50% writes using Linux AIO and O_DIRECT (to avoid VFS).

Filesystem Configs

Unless otherwise noted, all the default settings for filesystems, mdraid, etc. are used.

I forgot to capture the output of all the mkfs and mdadm commands to show what sizes were used. The drives are showing 4k block size and I did watch to make sure everything was aligned. Future tests will capture all this data in a more structured manner.

ZFS raid0: zpool create -f -m /mnt/sdi -o ashift=12 tank sdi1 sdj1 sdk1 sdl1
ZFS raid10: zpool create -f -m /mnt/sdi -o ashift=12 tank mirror sdi1 sdj1 mirror sdk1 sdl1
btrfs single: mkfs.btrfs --data single --metadata raid10 --force /dev/sd[ijkl]1
btrfs raid10: mkfs.btrfs --data raid10 --metadata raid10 --force /dev/sd[ijkl]1
j72x: 4 7200RPM drive JBOD / xfs

The graphs below are rendered on-the-fly using D3/Rickshaw. Try Chrome if you don't see any.

IOPS

The fastest test was the ext2 filesystem in a JBOD configuration, but not by much. xfs and ext4 are only slightly slower. I still consider ext4 and xfs to be the safe choices for production work. JBOD clearly outperforms any of the raid0 configs.

If you're looking for the best possible performance in Cassandra, JBOD is where it's at, but raid0 remains a safe/solid choice. Note that STCS compaction requires lots of free disk space and is probably not a good choice for JBOD. LCS gets more interesting given the additional IO capacity JBOD provides.

Bandwidth (MB/s)

Not much interesting here. Exactly the same profile as IOPS, which makes sense given the fixed IO size in the test config.

Latency Percentiles

Mouse over the lines and legend to get more details. The Y axis unit is microseconds. The 7200RPM drive JBOD was left off this graph because it throws the y scale off and is basically useless. The >p99 columns are also omitted but visible in the table at the end of this page.

btrfs faired poorly on latency measurements. It used a lot of CPU, probably for checksumming and these latency numbers show it.

ZFS used 2 full cores for the duration of both test configurations. Its latency line on the bottom looks suspect. Given the throughput numbers and generally odd IO patterns I observed, it will be a while before I test it again and I do not recommend it for high-performance use on Linux (Solaris and FreeBSD variants may be an entirely different matter!).

Raw fio Output

Links to the fio output in gists. Sorry about the file naming. It evolved over the course of the day.

Latency Numbers

This table shows all of the raw latency numbers reported by fio. The high-end percentiles' variance is sometimes interesting. The terrible performance of the 7200RPM drives should be easy to spot.

Conclusion

I've been using raid0 for years, only occasionally venturing off to try JBOD configurations in the past. I've even argued against JBOD. Clearly I was wrong. RAID 0 will still be easier to manage in many ways, since it ends up with a single filesystem with all the capacity available. JBOD systems will have to be careful to watch for individual disks filling up, which complicates monitoring & alerting as well as fstabs. For a 30% performance boost, it's going to be worth it as databases grow more robust support for JBOD.