Compress with zstd #44

Merged
lx merged 7 commits from trinity-1686a/garage:zstd-block into main 2021-04-14 21:27:36 +00:00

fix #27
Implement zstd compression for chunk storage. Chunks are transfered compressed over rpc (at least I believe).
Integrity is computed on uncompressed data, so reading a file require to decompresse it, compute the hash to assert integrity, and then send it compressed.
Incompressible data (determined as being bigger compressed than not) is stored not compressed. Each chunk is tested, an improvement could be to consider that if the first chunk of a file is incompressible the next chunks are probably incompressible too.
Another improvement would be to first test compressibility with very low level of compression (-5) to determine quickly if doing thorough work would be of use.

After testing, it does not seems that much of a good idea.

Testing with three files (8 and 6 bits of entropy per byte synthetic, plus tarred Cuda package containing source, object file, pictures..., arround 3.9bit per byte), it is not worth it. Testing with -5 as the test run, and 3 (the default) for true compression, it looks like zstd figure itself how much what it's working on can get compressed, and don't work as hard when it figure out data isn't very compressible, making that two step process marginably faster on incompressible data, but considerably slower on compressible data, and the more compressible, the worst it gets.

Results are normalized to a 2GiB file to make them comparable

file -5 3 -5 then 3 if compressible
incompressible (synthetic 8 bit per byte) 1.8s 3.8s 1.8s (2s faster than 3)
compressible (synthetic 6 bit per byte) 1.9s 7.1s 9.0s (1.9s slower than 3)
compressible (Cuda 3.9 bit per byte) 4.2s 15.6s 19.8s (4.2s slower than 3)


Inlined data is never compressed.
fix #27 Implement zstd compression for chunk storage. Chunks are transfered compressed over rpc (at least I believe). Integrity is computed on uncompressed data, so reading a file require to decompresse it, compute the hash to assert integrity, and then send it compressed. Incompressible data (determined as being bigger compressed than not) is stored not compressed. Each chunk is tested, an improvement could be to consider that if the first chunk of a file is incompressible the next chunks are probably incompressible too. ~~Another improvement would be to first test compressibility with very low level of compression (-5) to determine quickly if doing thorough work would be of use.~~ <details> <summary>After testing, it does not seems that much of a good idea.</summary> Testing with three files (8 and 6 bits of entropy per byte synthetic, plus tarred Cuda package containing source, object file, pictures..., arround 3.9bit per byte), it is not worth it. Testing with `-5` as the test run, and `3` (the default) for true compression, it looks like zstd figure itself how much what it's working on can get compressed, and don't work as hard when it figure out data isn't very compressible, making that two step process marginably faster on incompressible data, but considerably slower on compressible data, and the more compressible, the worst it gets. Results are normalized to a 2GiB file to make them comparable | file | -5 | 3 | -5 then 3 if compressible | | -------- | -----| ---- | ----- | | incompressible (synthetic 8 bit per byte) | 1.8s | 3.8s | 1.8s (2s faster than 3) | | compressible (synthetic 6 bit per byte) | 1.9s | 7.1s | 9.0s (1.9s slower than 3) | | compressible (Cuda 3.9 bit per byte) | 4.2s | 15.6s | 19.8s (4.2s slower than 3) | </details> <br/><br/> Inlined data is never compressed.
lx closed this pull request 2021-03-18 18:27:21 +00:00
lx reopened this pull request 2021-03-18 18:36:14 +00:00
lx changed target branch from dev-0.2 to master 2021-03-18 18:36:57 +00:00
trinity-1686a force-pushed zstd-block from fb9db78555 to 60d61749e8 2021-03-18 20:07:49 +00:00 Compare
trinity-1686a force-pushed zstd-block from 9d938de4a5 to 27cf95f785 2021-03-18 21:19:08 +00:00 Compare
lx closed this pull request 2021-03-19 13:16:18 +00:00
lx reopened this pull request 2021-03-19 13:18:09 +00:00
lx changed target branch from master to main 2021-03-19 13:18:16 +00:00
trinity-1686a force-pushed zstd-block from 27cf95f785 to 366407a89d 2021-03-19 17:15:54 +00:00 Compare
trinity-1686a force-pushed zstd-block from 366407a89d to 7ccabad7b1 2021-04-06 00:51:08 +00:00 Compare
Owner

I like how this is implemented in a nice self-contained fashion (only block.rs changes and it doesn't impact the rest of the codebase).

I'm slightly worried about how the checksum is verified when a block is read, as it needs to 1) decompress the data and 2) checksum it, meaning that it will take more RAM and CPU time than if we had saved a checksum of the compressed version. Reading a block occurs all the time so if there are easy optimisations there we should do them.

It is important that we are able to verify the checksum on the node that reads the block from disk because it needs to detect when the block is corrupted so that it can download back a fresh version. I think the best solution would be to store the blake2 checksum of the compressed data next to it by appending it at the end of the file (we could store it in another file next to it but that means more inodes, more open/read/close calls, and overall worse performance). Appending a checksum at the end of a .zst file makes it technically an invalid zstd file but as long as it's documented it seems fine to me. We could also use a special file extension such as .zstd_b2 to identify files that contain zstd data + their blake2 checksum.

Concerning the code organization, I would be more comfortable if instead of having PutBlock and PutCompressedBlock we had a single PutBlock variant, but changed the PutBlockMessage to be as follows:

#[derive(Debug, Serialize, Deserialize)]
pub struct PutBlockMessage {
	pub hash: Hash,
	pub data: BlockData,
}

#[derive(Debug, Serialize, Deserialize)]
pub enum BlockData {
	Plain(#[serde(with = "serde_bytes")] Vec<u8>),
    Compressed(#[serde(with = "serde_bytes")] Vec<u8>),
}

And we should try to replace Vec<u8> with the BlockData enum everywhere where it makes sense, instead of carrying a boolean compressed next to it.

I like how this is implemented in a nice self-contained fashion (only `block.rs` changes and it doesn't impact the rest of the codebase). I'm slightly worried about how the checksum is verified when a block is read, as it needs to 1) decompress the data and 2) checksum it, meaning that it will take more RAM and CPU time than if we had saved a checksum of the compressed version. Reading a block occurs all the time so if there are easy optimisations there we should do them. It is important that we are able to verify the checksum on the node that reads the block from disk because it needs to detect when the block is corrupted so that it can download back a fresh version. I think the best solution would be to store the blake2 checksum of the compressed data next to it by appending it at the end of the file (we could store it in another file next to it but that means more inodes, more open/read/close calls, and overall worse performance). Appending a checksum at the end of a `.zst` file makes it technically an invalid zstd file but as long as it's documented it seems fine to me. We could also use a special file extension such as `.zstd_b2` to identify files that contain zstd data + their blake2 checksum. Concerning the code organization, I would be more comfortable if instead of having `PutBlock` and `PutCompressedBlock` we had a single `PutBlock` variant, but changed the `PutBlockMessage` to be as follows: ``` #[derive(Debug, Serialize, Deserialize)] pub struct PutBlockMessage { pub hash: Hash, pub data: BlockData, } #[derive(Debug, Serialize, Deserialize)] pub enum BlockData { Plain(#[serde(with = "serde_bytes")] Vec<u8>), Compressed(#[serde(with = "serde_bytes")] Vec<u8>), } ``` And we should try to replace `Vec<u8>` with the `BlockData` enum everywhere where it makes sense, instead of carrying a boolean `compressed` next to it.
trinity-1686a force-pushed zstd-block from 357cfb8d84 to b9ea569158 2021-04-06 18:17:36 +00:00 Compare
Author
Owner

In case the node already have an uncompressed block, and receive a compressed version of it, it will store and use that compressed version, but not delete the uncompressed one

In case the node already have an uncompressed block, and receive a compressed version of it, it will store and use that compressed version, but not delete the uncompressed one
Owner

For what it's worth AFAIK ZFS checksum the compressed data

For what it's worth AFAIK ZFS checksum the compressed data
lx requested changes 2021-04-06 21:53:03 +00:00
Dismissed
lx left a comment
Owner

Just some small things

Just some small things
@ -150,0 +163,4 @@
BlockData::Plain(b) => (b, None),
BlockData::Compressed(b) => {
let checksum = blake2sum(&b);
(b, Some(checksum))
Owner

Maybe we should calculate this checksum on the sending side, i.e. in rpc_put_block, and include it in BlockData::Compressed? Meaning we do it only once, like we do for the uncompressed checksum

Maybe we should calculate this checksum on the sending side, i.e. in `rpc_put_block`, and include it in `BlockData::Compressed`? Meaning we do it only once, like we do for the uncompressed checksum
lx marked this conversation as resolved
@ -150,0 +167,4 @@
}
};
let _lock = self.data_dir_lock.lock().await;
Owner

Taking the lock should probably be the first thing in the function because is_block_compressed manipulates the data directory

Taking the lock should probably be the first thing in the function because `is_block_compressed` manipulates the data directory
lx marked this conversation as resolved
Author
Owner

@maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway.

@maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway.
Owner

@maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway.

If we could do that it would be the best solution indeed.

> @maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway. If we could do that it would be the best solution indeed.
trinity-1686a force-pushed zstd-block from dd3ef3ebc8 to d4fd074000 2021-04-08 13:52:55 +00:00 Compare
lx dismissed lx's review 2021-04-14 21:26:56 +00:00
lx merged commit 307858bf0a into main 2021-04-14 21:27:36 +00:00
trinity-1686a deleted branch zstd-block 2021-04-19 17:48:30 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#44
No description provided.