Add compression using zstd #173

Merged
lx merged 8 commits from trinity-1686a/garage:compression into main 2021-12-15 10:26:43 +00:00

fix #27

at the moment, doing an integrity check on every node only verify integrity using zstd, it can't detect swapped blocks for instance. Should I make it so such a check is done?
I'll re-add some test to smoke-test with compressible data before marking this ready. Done!

fix #27 at the moment, doing an integrity check on every node only verify integrity using zstd, it can't detect swapped blocks for instance. Should I make it so such a check is done? ~~I'll re-add some test to smoke-test with compressible data before marking this ready.~~ Done!
trinity-1686a added 5 commits 2021-12-14 17:28:20 +00:00
trinity-1686a added 1 commit 2021-12-14 18:30:40 +00:00
update Cargo.nix
All checks were successful
continuous-integration/drone/pr Build is passing
d902a64c32
trinity-1686a added 1 commit 2021-12-15 07:27:40 +00:00
add compressable files to test-smoke
All checks were successful
continuous-integration/drone/pr Build is passing
8807075158
trinity-1686a changed title from WIP: Add compression using zstd to Add compression using zstd 2021-12-15 08:12:03 +00:00
quentin reviewed 2021-12-15 09:41:57 +00:00
@ -98,6 +100,18 @@ Never run a Garage cluster where that is not the case.**
Changing the `replication_mode` of a cluster might work (make sure to shut down all nodes
and changing it everywhere at the time), but is not officially supported.
### `compression_level`
Owner

To help people, I propose adding a bit more information here and maybe a link or two?

I found these resources:

The first link says:

The library supports regular compression levels from 1 up to ZSTD_maxCLevel(), which is currently 22. Levels >= 20, labeled --ultra, should be used with caution, as they require more memory. The library also offers negative compression levels, which extend the range of speed vs. ratio preferences. The lower the level, the faster the speed (at the cost of compression).

The CLI explains the 3 different types of compression levels: standard, ultra and fast.


We could add something like that:

Values between 1 (faster compression) and 19 (smaller file) are standard compression levels for zstd. From 20 to 22, compression levels are referred as "ultra" and must be used with extra care as it will use lot of memory. A value of 0 will let zstd choose a default value (currently 3). Finally, zstd has also compression designed to be faster than default compression levels, they range from -1 (smaller file) to -99 (faster compression).

If you do not specify a compression_level entry, garage will set it to 1 for you. With this parameters, zstd consumes low amount of cpu and should work faster than line speed in most situations, while saving some space and intra-cluster
bandwidth.

If you want to totally deactivate zstd in garage, you can pass the special value none. No zstd related code will be called, your chunks will be stored on disk without any processing.

To help people, I propose adding a bit more information here and maybe a link or two? I found these resources: - https://facebook.github.io/zstd/zstd_manual.html - https://github.com/facebook/zstd/blob/d7e17363751974dc1ad10785deb4170b23bee0ec/programs/zstd.1.md - https://github.com/facebook/zstd/blob/9b97fdf74fa9f8adaa557a710b726e2e6966adee/lib/dictBuilder/zdict.c#L768 - https://github.com/facebook/zstd/blob/550410d05d7c7815b1ff417c4cac51153a78785e/lib/zstd.h#L97 - https://github.com/facebook/zstd/blob/38dfc4699e1108d839b3222b6093caaad5befd1c/programs/zstdcli.c#L774-L775 The first link says: > The library supports regular compression levels from 1 up to ZSTD_maxCLevel(), which is currently 22. Levels >= 20, labeled `--ultra`, should be used with caution, as they require more memory. The library also offers negative compression levels, which extend the range of speed vs. ratio preferences. The lower the level, the faster the speed (at the cost of compression). The CLI explains the 3 different types of compression levels: standard, ultra and fast. --- We could add something like that: > Values between `1` (faster compression) and `19` (smaller file) are standard compression levels for zstd. From `20` to `22`, compression levels are referred as "ultra" and must be used with extra care as it will use lot of memory. A value of `0` will let zstd choose a default value (currently `3`). Finally, zstd has also compression designed to be faster than default compression levels, they range from `-1` (smaller file) to `-99` (faster compression). > > If you do not specify a `compression_level` entry, garage will set it to `1` for you. With this parameters, zstd consumes low amount of cpu and should work faster than line speed in most situations, while saving some space and intra-cluster bandwidth. > > If you want to totally deactivate zstd in garage, you can pass the special value `none`. No zstd related code will be called, your chunks will be stored on disk without any processing.
quentin reviewed 2021-12-15 09:44:26 +00:00
@ -30,6 +30,10 @@ dd if=/dev/urandom of=/tmp/garage.1.rnd bs=1k count=2 # No multipart, inline sto
dd if=/dev/urandom of=/tmp/garage.2.rnd bs=1M count=5 # No multipart but file will be chunked
dd if=/dev/urandom of=/tmp/garage.3.rnd bs=1M count=10 # by default, AWS starts using multipart at 8MB
dd if=/dev/urandom bs=1k count=2 | base64 -w0 > /tmp/garage.1.b64
Owner

We could a small comment to says that this has been added to test zstd compression?

We could a small comment to says that this has been added to test zstd compression?
trinity-1686a added 1 commit 2021-12-15 10:20:13 +00:00
address review comments
All checks were successful
continuous-integration/drone/pr Build is passing
75c2766018
lx merged commit 1eb972b1ac into main 2021-12-15 10:26:43 +00:00
lx referenced this pull request from a commit 2021-12-15 10:26:44 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#173
No description provided.