Deuxfleurs/garage

Fork 83

Add support for compression #27

New issue

Closed

opened 2021-02-09 16:21:05 +00:00 by lx · 3 comments

lx commented

2021-02-09 16:21:05 +00:00

Owner

Here is an example of what we could gain:

-rw-r--r--  1 alex alex 1.0M Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d
-rw-r--r--  1 alex alex 131K Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.gz
-rw-r--r--  1 alex alex 207K Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4
-rw-r--r--  1 alex alex 126K Feb  9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst

Time measures:

alex@io:/tmp$ time gzip -k 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d

real    0m0.031s
user    0m0.027s
sys     0m0.004s


alex@io:/tmp$ time zstd 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d
06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d : 12.29%   (1048576 => 128844 bytes, 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst)

real    0m0.017s
user    0m0.012s
sys     0m0.005s


alex@io:/tmp$ time lz4 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d
Compressed filename will be : 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4
Compressed 1048576 bytes into 211530 bytes ==> 20.17%

real    0m0.014s
user    0m0.010s
sys     0m0.004s

Decompression should be done at node handling the API request, and not at node reading from disk (i.e. add a new kind of message : here is some data, and btw it is compressed)

Here is an example of what we could gain: ``` -rw-r--r-- 1 alex alex 1.0M Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d -rw-r--r-- 1 alex alex 131K Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.gz -rw-r--r-- 1 alex alex 207K Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4 -rw-r--r-- 1 alex alex 126K Feb 9 16:17 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst ``` Time measures: ``` alex@io:/tmp$ time gzip -k 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d real 0m0.031s user 0m0.027s sys 0m0.004s alex@io:/tmp$ time zstd 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d : 12.29% (1048576 => 128844 bytes, 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.zst) real 0m0.017s user 0m0.012s sys 0m0.005s alex@io:/tmp$ time lz4 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d Compressed filename will be : 06d34fc780f4f60549600af1d472510744092c2fb070b304ae75a23e0e88804d.lz4 Compressed 1048576 bytes into 211530 bytes ==> 20.17% real 0m0.014s user 0m0.010s sys 0m0.004s ``` Decompression should be done at node handling the API request, and not at node reading from disk (i.e. add a new kind of message : here is some data, and btw it is compressed)

lx added the

labels 2021-02-18 17:17:34 +00:00

lx added this to the Speculative milestone 2021-03-17 10:04:49 +00:00

trinity-1686a referenced this issue from a pull request that will close it,

2021-03-17 20:43:46 +00:00

Compress with zstd #44

lx closed this issue

2021-04-14 21:27:36 +00:00

trinity-1686a commented

2021-12-09 20:51:28 +00:00

Owner

this ended up not on main, so I think this should be reopened until it is re-worked.

this ended up not on [main](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main), so I think this should be reopened until it is re-worked.

trinity-1686a reopened this issue

2021-12-09 20:51:28 +00:00

quentin commented

2021-12-10 08:03:09 +00:00

Owner

Could we imagine activating it on a per-instance or on a per-bucket basis so compressing or not would be let at the discretion of the operator?

Should we recommend bigger chunk size when compression is used to benefit more from the compression?

Could we imagine activating it on a per-instance or on a per-bucket basis so compressing or not would be let at the discretion of the operator? Should we recommend bigger chunk size when compression is used to benefit more from the compression?

trinity-1686a commented

2021-12-11 12:54:06 +00:00

Owner

on a per-instance basis: yes, fairly easily
on a per-bucket basis: it would be possible, with some limitations (blocks are not owned by a single bucket, so if a block is shared between two buckets, first to create the block choose if it's compressed)
should we recommend bigger chunk size: to benchmark, but probably not, small files in the context of compression is generally files of a few KB, default chunk size is 1MB (to be clear, bigger files is always better as it means less huffman trees&co to store, but I believe the overhead is already low for 1MB file)

- on a per-instance basis: yes, fairly easily - on a per-bucket basis: it would be possible, with some limitations (blocks are not owned by a single bucket, so if a block is shared between two buckets, first to create the block choose if it's compressed) - should we recommend bigger chunk size: to benchmark, but probably not, small files in the context of compression is generally files of a few KB, default chunk size is 1MB (to be clear, bigger files is always better as it means less huffman trees&co to store, but I believe the overhead is already low for 1MB file)

👍 1

trinity-1686a self-assigned this 2021-12-14 14:26:18 +00:00

trinity-1686a referenced this issue from a pull request that will close it,

2021-12-14 17:28:20 +00:00

Add compression using zstd #173

lx referenced this issue from a commit

2021-12-15 10:26:44 +00:00

Add compression using zstd (#173)

lx closed this issue

2021-12-15 10:26:44 +00:00