Add support for compression #27

Closed
opened 2021-02-09 16:21:05 +00:00 by lx · 3 comments
Owner

Here is an example of what we could gain:

-rw-r--r--  1 alex alex 1.0M Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d
-rw-r--r--  1 alex alex 131K Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.gz
-rw-r--r--  1 alex alex 207K Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4
-rw-r--r--  1 alex alex 126K Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst

Time measures:

alex@io:/tmp$ time gzip -k 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d

real    0m0.031s
user    0m0.027s
sys     0m0.004s


alex@io:/tmp$ time zstd 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d
06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d : 12.29%   (1048576 => 128844 bytes, 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst)

real    0m0.017s
user    0m0.012s
sys     0m0.005s


alex@io:/tmp$ time lz4 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d
Compressed filename will be : 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4
Compressed 1048576 bytes into 211530 bytes ==> 20.17%

real    0m0.014s
user    0m0.010s
sys     0m0.004s

Decompression should be done at node handling the API request, and not at node reading from disk (i.e. add a new kind of message : here is some data, and btw it is compressed)

Here is an example of what we could gain: ``` -rw-r--r-- 1 alex alex 1.0M Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d -rw-r--r-- 1 alex alex 131K Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.gz -rw-r--r-- 1 alex alex 207K Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4 -rw-r--r-- 1 alex alex 126K Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst ``` Time measures: ``` alex@io:/tmp$ time gzip -k 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d real 0m0.031s user 0m0.027s sys 0m0.004s alex@io:/tmp$ time zstd 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d : 12.29% (1048576 => 128844 bytes, 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst) real 0m0.017s user 0m0.012s sys 0m0.005s alex@io:/tmp$ time lz4 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d Compressed filename will be : 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4 Compressed 1048576 bytes into 211530 bytes ==> 20.17% real 0m0.014s user 0m0.010s sys 0m0.004s ``` Decompression should be done at node handling the API request, and not at node reading from disk (i.e. add a new kind of message : here is some data, and btw it is compressed)
lx added the
Low priority
Improvement
labels 2021-02-18 17:17:34 +00:00
lx added this to the Speculative milestone 2021-03-17 10:04:49 +00:00
lx closed this issue 2021-04-14 21:27:36 +00:00

this ended up not on main, so I think this should be reopened until it is re-worked.

this ended up not on [main](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main), so I think this should be reopened until it is re-worked.
Owner

Could we imagine activating it on a per-instance or on a per-bucket basis so compressing or not would be let at the discretion of the operator?

Should we recommend bigger chunk size when compression is used to benefit more from the compression?

Could we imagine activating it on a per-instance or on a per-bucket basis so compressing or not would be let at the discretion of the operator? Should we recommend bigger chunk size when compression is used to benefit more from the compression?
  • on a per-instance basis: yes, fairly easily
  • on a per-bucket basis: it would be possible, with some limitations (blocks are not owned by a single bucket, so if a block is shared between two buckets, first to create the block choose if it's compressed)
  • should we recommend bigger chunk size: to benchmark, but probably not, small files in the context of compression is generally files of a few KB, default chunk size is 1MB (to be clear, bigger files is always better as it means less huffman trees&co to store, but I believe the overhead is already low for 1MB file)
- on a per-instance basis: yes, fairly easily - on a per-bucket basis: it would be possible, with some limitations (blocks are not owned by a single bucket, so if a block is shared between two buckets, first to create the block choose if it's compressed) - should we recommend bigger chunk size: to benchmark, but probably not, small files in the context of compression is generally files of a few KB, default chunk size is 1MB (to be clear, bigger files is always better as it means less huffman trees&co to store, but I believe the overhead is already low for 1MB file)
trinity-1686a self-assigned this 2021-12-14 14:26:18 +00:00
lx closed this issue 2021-12-15 10:26:44 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#27
No description provided.