Dec 152013

Download as ebook

In the previous post I wrote about the software layer in a home NAS/HTPC with ZFS on Linux. Here we’ll take a look at the performance of the system.

6x 3TB WD REDs in case and connected (cables on opposite side).

6x 3TB WD REDs in case and connected (cables on opposite side).

I had planned to run performance tests at different stages of data ingestion. Features such as compression and deduplication are bound to have some impact on performance. Even without these features, I wasn’t sure what to expect from my hardware and RaidZ either, especially that I didn’t exactly aim for the top level hardware. I don’t even have a decent boot drive; just two cheap flash drives. The only components that I didn’t get cheapest possible were the drives (WD green or Toshiba would have been significantly cheaper).

While the performance of the hardware is mostly independent of data, ZFS is a filesystem and its performance depended on the number of files, blocks, and total data size for a given hardware.

Flash Drive Benchmarks

Performance tests on 16GB ADATA Value-Driven S102 Pro USB 3.0 Flash Drive Model AS102P-16G-RGY

Two ADATA flash drives were used to create a Raid-0 (12GB) and Raid-1 (10GB) drives.
These are raw read and write performance directly to the drives without filesystem overhead.

Read performance benchmark:

Write performance benchmark:

ZFS Benchmarks

At 24% capacity with 2.88TiB of data in 23 million block, 7% dups. RAM is 8GB.

The data size is important in evaluating deduplication performance in light of the available RAM. As previously noted, ZFS requires copious amount of RAM even without deduplication. When deduplication is enabled, performance deteriorates rapidly when RAM is not sufficient. My performance tests confirm this as you’ll see below.

All tests were run without any cpu-intensive tasks running or any I/O (beyond that of the network activity by SSH). The files I chose were compressed video files that gzip couldn’t compress any further. This exercised the worst-case scenario of gzip in terms of performance when writing.

Read Tests

Reading was done with a larger than RAM (13GB vs 8GB) file to see sustained performance and with smaller than RAM (1.5GB vs 8GB) files with variations on hot, cold and, dirty cache (i.e. the same file was read previously or not or there were written data to flush to disk at the time or reading the target file.)

Cold read, file >> RAM:

# dd if=/tank/media/movies/Amor.mkv  of=/dev/null bs=1M
13344974829 bytes (13 GB) copied, 36.5817 s, 365 MB/s

365MB/s of sustained read over 13GB is solid performance from the 5400 rpm Reds.

Cold read, file << RAM, with dirty cache:

# dd if=/tank/media/movies/Dead.Man.Walking.mkv of=/dev/null bs=1M
1467836416 bytes (1.5 GB) copied, 5.84585 s, 251 MB/s

Not bad, with all the overhead and cache misses, managed to do 251MB/s over 1.5GB.

Hot read, file << RAM, file fully cached:

# dd if=/tank/media/movies/Dead.Man.Walking.mkv of=/dev/null bs=1M
1467836416 bytes (1.5 GB) copied, 0.357955 s, 4.1 GB/s

4.1GB/s is the peak for my RAM/CPU/Bus and ZFS overhead. I couldn’t exceed it, but I can do much worse with smaller bs values (anything lower than 128KB, which is the ZFS block size, will trash the performance, even when fully reading from RAM).

Cold read, file << RAM:

# dd if=/tank/media/movies/Dead.Man.Walking.mkv of=/dev/null bs=1M
1467836416 bytes (1.5 GB) copied, 4.10563 s, 358 MB/s

~360MB/s seems to be the real disk+ZFS peak read bandwidth.

Write Tests

Disk-to-Disk copy on gzip-9 and dedup folder, file >> RAM:

# dd if=/tank/media/movies/Amor.mkv  of=/tank/data/Amor.mkv bs=1M
13344974829 bytes (13 GB) copied, 1232.93 s, 10.8 MB/s

Poor performance with disk-to-disk copying. I was expecting better performance, even though the heads have to go back and forth between reading and writing which trashes the performance. More tests to find out what’s going on. Notice the file is not compressible, but I’m compressing with gzip-9 and dedup is on, and this is decidedly a dup file!

Now, to find out a breakdown of overheads. Before every test I primed the cache by reading the file to make sure I get >4GB/s read and remove read overhead.

No Compression, Deduplication on verify (hits are byte-for-byte compared before duplication assumed):

# zfs create –o compression=off –o dedup=verify tank/dedup
# dd if=/tank/media/movies/Dead.Man.Walking.mkv of=/tank/dedup/Dead.avi bs=1M
1467836416 bytes (1.5 GB) copied, 82.6186 s, 17.8 MB/s

Very poor performance from dedup! I thought compression+reading were the killers, looks like dedup is not very swift after all.

Gzip-9 Compression, No Deduplication:

# zfs create –o compression=gzip-9 –o dedup=off tank/comp
# dd if=/tank/media/movies/Dead.Man.Walking.mkv of=/tank/comp/Dead.avi bs=1M
1467836416 bytes (1.5 GB) copied, 5.4423 s, 270 MB/s

Now that’s something else… Bravo AMD and ZFS! 270MB/s gzip-9 performance on incompressible data including ZFS writing!

No Compression, No Deduplication:

# zfs create –o compression=off –o dedup=off tank/test
# dd if=/tank/media/movies/Dead.Man.Walking.mkv of=/tank/test/Dead.avi bs=1M
1467836416 bytes (1.5 GB) copied, 3.81445 s, 385 MB/s

Faster write speeds than reading!! At least once compression and dedup are off. Why couldn’t reading hit quite the same mark?

Disk-to-Disk copy again, No Compression, No Deduplication, file >> RAM:

# dd if=/tank/media/movies/Amor.mkv  of=/tank/data/Amor.mkv bs=1M
13344974829 bytes (13 GB) copied, 74.126 s, 180 MB/s

The impact that compression and deduplication have are undeniable. Compression doesn’t nearly have such a big impact and as others have pointed out, once you move from LZ4 to GZip, you might as well use GZip-9. That seems to be wise indeed. Unless one has heavy writing (or worse, read/modify/write,) in which case LZ4 is the wiser choice.

Deduplication has a very heavy hand and it’s not an easy bargain. One must be very careful with the type of data they deal with before enabling it nilly-willy.

ZFS ARC Size Tests

ZFS needs to refer to the deduplication table (DDT) whenever it writes/updates/deletes to a deduplicated dataset. The DDT can use 320 bytes per block for every block in a dedup enabled dataset. This can add up quite rapidly, especially with small files and unique files that will never benefit form deduplication. ARC is the adaptive cache that ZFS uses for its data. The size is preconfigured to be a certain percentage of the available RAM. In addition to the ARC there is a Metadata cache, which hold the file metadata necessary for stating and enumerating the filesystem.

Here I run performance tests with different ARC sizes to see its impact on the performance. First, let’s see how many blocks we have in the DDT.

# zdb -DD tank

DDT-sha256-zap-duplicate: 3756912 entries, size 933 on disk, 150 in core
DDT-sha256-zap-unique: 31487302 entries, size 975 on disk, 157 in core

DDT histogram (aggregated over all DDTs):

bucket allocated referenced
______ ______________________________ ______________________________
------ ------ ----- ----- ----- ------ ----- ----- -----
1 30.0M 3.74T 3.66T 3.66T 30.0M 3.74T 3.66T 3.66T
2 3.34M 414G 393G 395G 7.58M 939G 889G 895G
4 231K 25.3G 22.5G 23.0G 1.11M 125G 111G 113G
8 14.6K 1.09G 974M 1.00G 135K 9.92G 8.57G 9.04G
16 1.06K 43.2M 14.2M 21.0M 21.4K 777M 270M 407M
32 308 6.94M 2.62M 4.64M 12.4K 265M 101M 184M
64 118 1.19M 470K 1.26M 9.40K 103M 45.4M 111M
128 63 1.38M 84.5K 551K 11.2K 258M 14.4M 97.8M
256 21 154K 26K 176K 6.78K 45.3M 9.93M 57.6M
512 8 132K 4K 63.9K 5.35K 99.8M 2.67M 42.7M
1K 2 1K 1K 16.0K 3.01K 1.50M 1.50M 24.0M
2K 3 1.50K 1.50K 24.0K 8.27K 4.13M 4.13M 66.1M
4K 1 512 512 7.99K 4.21K 2.10M 2.10M 33.6M
8K 2 1K 1K 16.0K 21.6K 10.8M 10.8M 173M
16K 1 128K 512 7.99K 20.3K 2.54G 10.2M 162M
Total 33.6M 4.17T 4.07T 4.07T 39.0M 4.79T 4.64T 4.66T

dedup = 1.14, compress = 1.03, copies = 1.00, dedup * compress / copies = 1.18
# zpool list
tank 16.2T 7.05T 9.20T 43% 1.14x ONLINE -

We have 7.05TiB of raw data comprising 43% of the disk space. The DDT contains about 31.5M unique blocks and 3.75M duplicated blocks (blocks with at least 2 references each.) The zfs_arc_max controls the maximum size of the ARC. I’ve seen ZFS exceed this limit on a number of occasions. Conversely, when changing this value, ZFS might not react immediately, whether by shrinking or enlarging it. I suspect there is a more complicated formula to splitting the available RAM between the ARC, ARC Meta and the file cache. This brings me to the ARC Meta, controlled by zfs_arc_meta_limit, which at first I thought was not important. The default for ARC max on my 8GB system is 3.5GB and ARC Meta limit is about 900MB. ARC Meta is useful when we traverse folders and need quick stats. ARC, on the other hand, is when we need block dedup info or to update one. In the following benchmark I’m only modifying ARC max size and leaving ARC Meta on the default.

To change the max ARC size on the fly, we use (where ‘5368709120’ is the desired size):

# sh -c "echo 5368709120 > /sys/module/zfs/parameters/zfs_arc_max"

And to change it permanently:

sh -c "echo 'options zfs zfs_arc_max=5368709120' >> /etc/modprobe.d/zfs.conf"

Before every run of Bonnie++ I set the max ARC size accordingly.
I used Bonnie to Google Chart to generate these charts.
Find the full raw output from Bonnie++ below for more data-points if interested.

Raw Bonnie++ output:

zfs_arc_max = 512MB
Dedup: Verify, Compression: Gzip-9
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
S 14G 66854 80 6213 2 4446 6 84410 94 747768 80 136.7 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 8934 90 28044 88 16285 99 8392 90 +++++ +++ 16665 99

No dedup, gzip-9
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
S 14G 66260 79 163407 29 146948 39 88687 99 933563 91 268.8 4
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 16774 91 +++++ +++ 19428 99 16084 99 +++++ +++ 14223 86

No dedup, no compression
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
S 14G 63889 77 191476 34 136614 41 82009 95 396090 56 117.2 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 14866 88 +++++ +++ 19351 99 15650 97 +++++ +++ 14215 93

zfs_arc_max = 5120MB
Dedup: off, Compression: Gzip-9
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
S 14G 76192 94 203633 42 180248 54 88151 98 847635 82 433.3 5
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 19704 99 +++++ +++ 19350 99 9040 92 +++++ +++ 15842 97

zfs_arc_max = 5120MB
Dedup: verify, Compression: Gzip-9
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
S 14G 81563 98 5574 2 2701 4 83077 97 714039 77 143.9 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 14758 79 +++++ +++ 13951 94 7537 89 +++++ +++ 14676 98

zfs_arc_max = 5120MB
Dedup: off, Compression: off
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
S 14G 76409 97 312385 73 149409 48 83228 95 368752 55 183.5 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 10814 58 +++++ +++ 16923 99 6595 89 11372 93 3737 99


The zpool sustained raw read and write speeds are in excess of 350 MB/s. Reading reached 365MB/s while writing 385MB/s.

Cached-reads exceeded 4GB/s, which is fantastic, considering the low-end hardware. I should note here that the CPU isn’t overclocked and RAM is at 1600Mhz, not its rated 1866Mhz or higher. It’s 9-10-9cl, not exactly the fastest, but very decent.

Compression is very fast, even on incompressible data and when gzip, unlike lz4, doesn’t short-circuit and bail on incompressible data.

Deduplication overhead, on the other hand, is unbelievable! The write dropped well below 20MB/s, which is dismal. I suspect this is what happens when you have barely enough RAM for DDT (dedup table) and file cache. I must have forced some DDT reads from disk, which must be painfully slow. I strongly suspect this was the case, as when I rm-ed the 13GB file from dedup-enabled folder it took over 10minutes. This must be to find and remove dedup entries, or, in this case, to decrement the ref-count, as the file was cloned from the array. Now I really wish I got 16GB RAM.

The bigger surprise than dedup sluggishness is how fast disk-to-disk copy is. Considering that reads are around 360MB/s, I wasn’t expecting an exact halving with read and write on at the same time. But we got a solid 180MB/s read/write speed. It’s as if ZFS is so good with caching and cache-writes that the head seek overhead is well amortized. Notice this was on 13GB file, which is significantly larger than total RAM, so caching isn’t trivial here.

There is more to be said about the ARC size. It seems to have less of an impact when the DDT is already thrashing. I found that changing the ARC Meta limit I got a more noticeable improvement on the performance, but that only affects metadata. Dedup table looking and updates are still dependent on the ARC size. With my already-starving RAM, changing the ARC limit didn’t have a clear impact one way or the other. For now I’ll give 1.5GB to ARC Meta and 2.5GB to ARC. In this situation where DDT is too large for the RAM there are three solutions:

    1. Buy more RAM (I’m limited to 16GB in 2 DIMMs).
      Install L2ARC SSD (I’m out of Sata ports.)
      Reduce DDT table by removing unique entries (timely, but worthwhile.)
  • Overall, I’m happy with the performance, considering my use case doesn’t include heavy writing or copying. Dedup and compression are known to have tradeoffs and you pay for them during writes. Compression doesn’t nearly impact writes as much as one cold suspect. Perhaps with generous RAM thrown at ZFS dedup performance could be on par with compression, if not better (technically, it’s a hashtable lookup, must be better, at least on paper). Still, solid performance overall, coming from a $300 hardware and 5400 rpm rust-spinner disks.

    …but, as experience shows, that’s not the last word. There is more to learn about ZFS and its dark alleys.

    Download as ebook

      5 Responses to “18TB Home NAS/HTPC with ZFS on Linux (Part 3)”