Compress with zstd #44
No reviewers
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#44
Loading…
Reference in a new issue
No description provided.
Delete branch "trinity-1686a/garage:zstd-block"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
fix #27
Implement zstd compression for chunk storage. Chunks are transfered compressed over rpc (at least I believe).
Integrity is computed on uncompressed data, so reading a file require to decompresse it, compute the hash to assert integrity, and then send it compressed.
Incompressible data (determined as being bigger compressed than not) is stored not compressed. Each chunk is tested, an improvement could be to consider that if the first chunk of a file is incompressible the next chunks are probably incompressible too.
Another improvement would be to first test compressibility with very low level of compression (-5) to determine quickly if doing thorough work would be of use.After testing, it does not seems that much of a good idea.
Testing with three files (8 and 6 bits of entropy per byte synthetic, plus tarred Cuda package containing source, object file, pictures..., arround 3.9bit per byte), it is not worth it. Testing with
-5
as the test run, and3
(the default) for true compression, it looks like zstd figure itself how much what it's working on can get compressed, and don't work as hard when it figure out data isn't very compressible, making that two step process marginably faster on incompressible data, but considerably slower on compressible data, and the more compressible, the worst it gets.Results are normalized to a 2GiB file to make them comparable
Inlined data is never compressed.
fb9db78555
to60d61749e8
9d938de4a5
to27cf95f785
27cf95f785
to366407a89d
366407a89d
to7ccabad7b1
I like how this is implemented in a nice self-contained fashion (only
block.rs
changes and it doesn't impact the rest of the codebase).I'm slightly worried about how the checksum is verified when a block is read, as it needs to 1) decompress the data and 2) checksum it, meaning that it will take more RAM and CPU time than if we had saved a checksum of the compressed version. Reading a block occurs all the time so if there are easy optimisations there we should do them.
It is important that we are able to verify the checksum on the node that reads the block from disk because it needs to detect when the block is corrupted so that it can download back a fresh version. I think the best solution would be to store the blake2 checksum of the compressed data next to it by appending it at the end of the file (we could store it in another file next to it but that means more inodes, more open/read/close calls, and overall worse performance). Appending a checksum at the end of a
.zst
file makes it technically an invalid zstd file but as long as it's documented it seems fine to me. We could also use a special file extension such as.zstd_b2
to identify files that contain zstd data + their blake2 checksum.Concerning the code organization, I would be more comfortable if instead of having
PutBlock
andPutCompressedBlock
we had a singlePutBlock
variant, but changed thePutBlockMessage
to be as follows:And we should try to replace
Vec<u8>
with theBlockData
enum everywhere where it makes sense, instead of carrying a booleancompressed
next to it.357cfb8d84
tob9ea569158
In case the node already have an uncompressed block, and receive a compressed version of it, it will store and use that compressed version, but not delete the uncompressed one
For what it's worth AFAIK ZFS checksum the compressed data
Just some small things
@ -150,0 +163,4 @@
BlockData::Plain(b) => (b, None),
BlockData::Compressed(b) => {
let checksum = blake2sum(&b);
(b, Some(checksum))
Maybe we should calculate this checksum on the sending side, i.e. in
rpc_put_block
, and include it inBlockData::Compressed
? Meaning we do it only once, like we do for the uncompressed checksum@ -150,0 +167,4 @@
}
};
let _lock = self.data_dir_lock.lock().await;
Taking the lock should probably be the first thing in the function because
is_block_compressed
manipulates the data directory@maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway.
If we could do that it would be the best solution indeed.
dd3ef3ebc8
tod4fd074000