Add compression using zstd #173

Merged
lx merged 8 commits from trinity-1686a/garage:compression into main 2021-12-15 10:26:43 +00:00

fix #27

at the moment, doing an integrity check on every node only verify integrity using zstd, it can't detect swapped blocks for instance. Should I make it so such a check is done?
I'll re-add some test to smoke-test with compressible data before marking this ready. Done!

fix #27 at the moment, doing an integrity check on every node only verify integrity using zstd, it can't detect swapped blocks for instance. Should I make it so such a check is done? ~~I'll re-add some test to smoke-test with compressible data before marking this ready.~~ Done!
trinity-1686a added 5 commits 2021-12-14 17:28:20 +00:00
trinity-1686a added 1 commit 2021-12-14 18:30:40 +00:00
continuous-integration/drone/pr Build is passing Details
d902a64c32
update Cargo.nix
trinity-1686a added 1 commit 2021-12-15 07:27:40 +00:00
continuous-integration/drone/pr Build is passing Details
8807075158
add compressable files to test-smoke
trinity-1686a changed title from WIP: Add compression using zstd to Add compression using zstd 2021-12-15 08:12:03 +00:00
quentin reviewed 2021-12-15 09:41:57 +00:00
@ -98,6 +100,18 @@ Never run a Garage cluster where that is not the case.**
Changing the `replication_mode` of a cluster might work (make sure to shut down all nodes
and changing it everywhere at the time), but is not officially supported.
### `compression_level`
Owner

To help people, I propose adding a bit more information here and maybe a link or two?

I found these resources:

The first link says:

The library supports regular compression levels from 1 up to ZSTD_maxCLevel(), which is currently 22. Levels >= 20, labeled --ultra, should be used with caution, as they require more memory. The library also offers negative compression levels, which extend the range of speed vs. ratio preferences. The lower the level, the faster the speed (at the cost of compression).

The CLI explains the 3 different types of compression levels: standard, ultra and fast.


We could add something like that:

Values between 1 (faster compression) and 19 (smaller file) are standard compression levels for zstd. From 20 to 22, compression levels are referred as "ultra" and must be used with extra care as it will use lot of memory. A value of 0 will let zstd choose a default value (currently 3). Finally, zstd has also compression designed to be faster than default compression levels, they range from -1 (smaller file) to -99 (faster compression).

If you do not specify a compression_level entry, garage will set it to 1 for you. With this parameters, zstd consumes low amount of cpu and should work faster than line speed in most situations, while saving some space and intra-cluster
bandwidth.

If you want to totally deactivate zstd in garage, you can pass the special value none. No zstd related code will be called, your chunks will be stored on disk without any processing.

To help people, I propose adding a bit more information here and maybe a link or two? I found these resources: - https://facebook.github.io/zstd/zstd_manual.html - https://github.com/facebook/zstd/blob/d7e17363751974dc1ad10785deb4170b23bee0ec/programs/zstd.1.md - https://github.com/facebook/zstd/blob/9b97fdf74fa9f8adaa557a710b726e2e6966adee/lib/dictBuilder/zdict.c#L768 - https://github.com/facebook/zstd/blob/550410d05d7c7815b1ff417c4cac51153a78785e/lib/zstd.h#L97 - https://github.com/facebook/zstd/blob/38dfc4699e1108d839b3222b6093caaad5befd1c/programs/zstdcli.c#L774-L775 The first link says: > The library supports regular compression levels from 1 up to ZSTD_maxCLevel(), which is currently 22. Levels >= 20, labeled `--ultra`, should be used with caution, as they require more memory. The library also offers negative compression levels, which extend the range of speed vs. ratio preferences. The lower the level, the faster the speed (at the cost of compression). The CLI explains the 3 different types of compression levels: standard, ultra and fast. --- We could add something like that: > Values between `1` (faster compression) and `19` (smaller file) are standard compression levels for zstd. From `20` to `22`, compression levels are referred as "ultra" and must be used with extra care as it will use lot of memory. A value of `0` will let zstd choose a default value (currently `3`). Finally, zstd has also compression designed to be faster than default compression levels, they range from `-1` (smaller file) to `-99` (faster compression). > > If you do not specify a `compression_level` entry, garage will set it to `1` for you. With this parameters, zstd consumes low amount of cpu and should work faster than line speed in most situations, while saving some space and intra-cluster bandwidth. > > If you want to totally deactivate zstd in garage, you can pass the special value `none`. No zstd related code will be called, your chunks will be stored on disk without any processing.
quentin reviewed 2021-12-15 09:44:26 +00:00
@ -30,6 +30,10 @@ dd if=/dev/urandom of=/tmp/garage.1.rnd bs=1k count=2 # No multipart, inline sto
dd if=/dev/urandom of=/tmp/garage.2.rnd bs=1M count=5 # No multipart but file will be chunked
dd if=/dev/urandom of=/tmp/garage.3.rnd bs=1M count=10 # by default, AWS starts using multipart at 8MB
dd if=/dev/urandom bs=1k count=2 | base64 -w0 > /tmp/garage.1.b64
Owner

We could a small comment to says that this has been added to test zstd compression?

We could a small comment to says that this has been added to test zstd compression?
trinity-1686a added 1 commit 2021-12-15 10:20:13 +00:00
continuous-integration/drone/pr Build is passing Details
75c2766018
address review comments
lx merged commit 1eb972b1ac into main 2021-12-15 10:26:43 +00:00
Sign in to join this conversation.
No description provided.