Add compression using zstd #173

trinity-1686a · 2021-12-14T17:28:20Z

trinity-1686a commented

2021-12-14 17:28:20 +00:00

at the moment, doing an integrity check on every node only verify integrity using zstd, it can't detect swapped blocks for instance. Should I make it so such a check is done?
~~I'll re-add some test to smoke-test with compressible data before marking this ready.~~ Done!

fix #27 at the moment, doing an integrity check on every node only verify integrity using zstd, it can't detect swapped blocks for instance. Should I make it so such a check is done? ~~I'll re-add some test to smoke-test with compressible data before marking this ready.~~ Done!

trinity-1686a added 5 commits 2021-12-14 17:28:20 +00:00

add compressed data block d611054b5f

add config for compression 3687b2d4be

doc for compression f802148395

delete old block when receiving a compressed one a51f671d86

process compressed block when listing all files

continuous-integration/drone/pr Build is failing

Details

d5d75fb4fa

trinity-1686a added 1 commit 2021-12-14 18:30:40 +00:00

update Cargo.nix

continuous-integration/drone/pr Build is passing

Details

d902a64c32

trinity-1686a added 1 commit 2021-12-15 07:27:40 +00:00

add compressable files to test-smoke

continuous-integration/drone/pr Build is passing

Details

8807075158

trinity-1686a changed title from ~~WIP: Add compression using zstd~~ to Add compression using zstd

2021-12-15 08:12:03 +00:00

quentin reviewed 2021-12-15 09:41:57 +00:00

doc/book/src/reference_manual/configuration.md

					
				@ -98,6 +100,18 @@ Never run a Garage cluster where that is not the case.**

				Changing the `replication_mode` of a cluster might work (make sure to shut down all nodes

				and changing it everywhere at the time), but is not officially supported.

				### `compression_level`

quentin commented

2021-12-15 09:41:57 +00:00

To help people, I propose adding a bit more information here and maybe a link or two?

I found these resources:

The first link says:

The library supports regular compression levels from 1 up to ZSTD_maxCLevel(), which is currently 22. Levels >= 20, labeled --ultra, should be used with caution, as they require more memory. The library also offers negative compression levels, which extend the range of speed vs. ratio preferences. The lower the level, the faster the speed (at the cost of compression).

The CLI explains the 3 different types of compression levels: standard, ultra and fast.

We could add something like that:

Values between 1 (faster compression) and 19 (smaller file) are standard compression levels for zstd. From 20 to 22, compression levels are referred as "ultra" and must be used with extra care as it will use lot of memory. A value of 0 will let zstd choose a default value (currently 3). Finally, zstd has also compression designed to be faster than default compression levels, they range from -1 (smaller file) to -99 (faster compression).

If you do not specify a compression_level entry, garage will set it to 1 for you. With this parameters, zstd consumes low amount of cpu and should work faster than line speed in most situations, while saving some space and intra-cluster
bandwidth.

If you want to totally deactivate zstd in garage, you can pass the special value none. No zstd related code will be called, your chunks will be stored on disk without any processing.

To help people, I propose adding a bit more information here and maybe a link or two? I found these resources: - https://facebook.github.io/zstd/zstd_manual.html - https://github.com/facebook/zstd/blob/d7e17363751974dc1ad10785deb4170b23bee0ec/programs/zstd.1.md - https://github.com/facebook/zstd/blob/9b97fdf74fa9f8adaa557a710b726e2e6966adee/lib/dictBuilder/zdict.c#L768 - https://github.com/facebook/zstd/blob/550410d05d7c7815b1ff417c4cac51153a78785e/lib/zstd.h#L97 - https://github.com/facebook/zstd/blob/38dfc4699e1108d839b3222b6093caaad5befd1c/programs/zstdcli.c#L774-L775 The first link says: > The library supports regular compression levels from 1 up to ZSTD_maxCLevel(), which is currently 22. Levels >= 20, labeled `--ultra`, should be used with caution, as they require more memory. The library also offers negative compression levels, which extend the range of speed vs. ratio preferences. The lower the level, the faster the speed (at the cost of compression). The CLI explains the 3 different types of compression levels: standard, ultra and fast. --- We could add something like that: > Values between `1` (faster compression) and `19` (smaller file) are standard compression levels for zstd. From `20` to `22`, compression levels are referred as "ultra" and must be used with extra care as it will use lot of memory. A value of `0` will let zstd choose a default value (currently `3`). Finally, zstd has also compression designed to be faster than default compression levels, they range from `-1` (smaller file) to `-99` (faster compression). > > If you do not specify a `compression_level` entry, garage will set it to `1` for you. With this parameters, zstd consumes low amount of cpu and should work faster than line speed in most situations, while saving some space and intra-cluster bandwidth. > > If you want to totally deactivate zstd in garage, you can pass the special value `none`. No zstd related code will be called, your chunks will be stored on disk without any processing.

quentin reviewed 2021-12-15 09:44:26 +00:00

script/test-smoke.sh Outdated

					
				@ -30,6 +30,10 @@ dd if=/dev/urandom of=/tmp/garage.1.rnd bs=1k count=2 # No multipart, inline sto

				dd if=/dev/urandom of=/tmp/garage.2.rnd bs=1M count=5 # No multipart but file will be chunked

				dd if=/dev/urandom of=/tmp/garage.3.rnd bs=1M count=10 # by default, AWS starts using multipart at 8MB

				dd if=/dev/urandom bs=1k count=2  | base64 -w0 > /tmp/garage.1.b64