Compress with zstd #44

trinity-1686a · 2021-03-17T20:43:46Z

trinity-1686a commented

2021-03-17 20:43:46 +00:00

fix #27
Implement zstd compression for chunk storage. Chunks are transfered compressed over rpc (at least I believe).
Integrity is computed on uncompressed data, so reading a file require to decompresse it, compute the hash to assert integrity, and then send it compressed.
Incompressible data (determined as being bigger compressed than not) is stored not compressed. Each chunk is tested, an improvement could be to consider that if the first chunk of a file is incompressible the next chunks are probably incompressible too.
~~Another improvement would be to first test compressibility with very low level of compression (-5) to determine quickly if doing thorough work would be of use.~~

After testing, it does not seems that much of a good idea.

Testing with three files (8 and 6 bits of entropy per byte synthetic, plus tarred Cuda package containing source, object file, pictures..., arround 3.9bit per byte), it is not worth it. Testing with -5 as the test run, and 3 (the default) for true compression, it looks like zstd figure itself how much what it's working on can get compressed, and don't work as hard when it figure out data isn't very compressible, making that two step process marginably faster on incompressible data, but considerably slower on compressible data, and the more compressible, the worst it gets.

Results are normalized to a 2GiB file to make them comparable

file	-5	3	-5 then 3 if compressible
incompressible (synthetic 8 bit per byte)	1.8s	3.8s	1.8s (2s faster than 3)
compressible (synthetic 6 bit per byte)	1.9s	7.1s	9.0s (1.9s slower than 3)
compressible (Cuda 3.9 bit per byte)	4.2s	15.6s	19.8s (4.2s slower than 3)

Inlined data is never compressed.

fix #27 Implement zstd compression for chunk storage. Chunks are transfered compressed over rpc (at least I believe). Integrity is computed on uncompressed data, so reading a file require to decompresse it, compute the hash to assert integrity, and then send it compressed. Incompressible data (determined as being bigger compressed than not) is stored not compressed. Each chunk is tested, an improvement could be to consider that if the first chunk of a file is incompressible the next chunks are probably incompressible too. ~~Another improvement would be to first test compressibility with very low level of compression (-5) to determine quickly if doing thorough work would be of use.~~ <details> <summary>After testing, it does not seems that much of a good idea.</summary> Testing with three files (8 and 6 bits of entropy per byte synthetic, plus tarred Cuda package containing source, object file, pictures..., arround 3.9bit per byte), it is not worth it. Testing with `-5` as the test run, and `3` (the default) for true compression, it looks like zstd figure itself how much what it's working on can get compressed, and don't work as hard when it figure out data isn't very compressible, making that two step process marginably faster on incompressible data, but considerably slower on compressible data, and the more compressible, the worst it gets. Results are normalized to a 2GiB file to make them comparable | file | -5 | 3 | -5 then 3 if compressible | | -------- | -----| ---- | ----- | | incompressible (synthetic 8 bit per byte) | 1.8s | 3.8s | 1.8s (2s faster than 3) | | compressible (synthetic 6 bit per byte) | 1.9s | 7.1s | 9.0s (1.9s slower than 3) | | compressible (Cuda 3.9 bit per byte) | 4.2s | 15.6s | 19.8s (4.2s slower than 3) | </details> <br/><br/> Inlined data is never compressed.

lx closed this pull request

2021-03-18 18:27:21 +00:00

lx reopened this pull request

2021-03-18 18:36:14 +00:00

lx changed target branch from dev-0.2 to master

2021-03-18 18:36:57 +00:00

trinity-1686a force-pushed zstd-block from fb9db78555 to 60d61749e8

2021-03-18 20:07:49 +00:00

Compare

trinity-1686a force-pushed zstd-block from 9d938de4a5 to 27cf95f785

2021-03-18 21:19:08 +00:00

Compare

lx closed this pull request

2021-03-19 13:16:18 +00:00

lx reopened this pull request

2021-03-19 13:18:09 +00:00

lx changed target branch from master to main

2021-03-19 13:18:16 +00:00

trinity-1686a force-pushed zstd-block from 27cf95f785 to 366407a89d

2021-03-19 17:15:54 +00:00

Compare

trinity-1686a force-pushed zstd-block from 366407a89d to 7ccabad7b1

2021-04-06 00:51:08 +00:00

Compare

lx commented

2021-04-06 13:47:32 +00:00

I like how this is implemented in a nice self-contained fashion (only block.rs changes and it doesn't impact the rest of the codebase).

I'm slightly worried about how the checksum is verified when a block is read, as it needs to 1) decompress the data and 2) checksum it, meaning that it will take more RAM and CPU time than if we had saved a checksum of the compressed version. Reading a block occurs all the time so if there are easy optimisations there we should do them.

It is important that we are able to verify the checksum on the node that reads the block from disk because it needs to detect when the block is corrupted so that it can download back a fresh version. I think the best solution would be to store the blake2 checksum of the compressed data next to it by appending it at the end of the file (we could store it in another file next to it but that means more inodes, more open/read/close calls, and overall worse performance). Appending a checksum at the end of a .zst file makes it technically an invalid zstd file but as long as it's documented it seems fine to me. We could also use a special file extension such as .zstd_b2 to identify files that contain zstd data + their blake2 checksum.

Concerning the code organization, I would be more comfortable if instead of having PutBlock and PutCompressedBlock we had a single PutBlock variant, but changed the PutBlockMessage to be as follows:

#[derive(Debug, Serialize, Deserialize)]
pub struct PutBlockMessage {
	pub hash: Hash,
	pub data: BlockData,
}

#[derive(Debug, Serialize, Deserialize)]
pub enum BlockData {
	Plain(#[serde(with = "serde_bytes")] Vec<u8>),
    Compressed(#[serde(with = "serde_bytes")] Vec<u8>),
}

And we should try to replace Vec<u8> with the BlockData enum everywhere where it makes sense, instead of carrying a boolean compressed next to it.

I like how this is implemented in a nice self-contained fashion (only `block.rs` changes and it doesn't impact the rest of the codebase). I'm slightly worried about how the checksum is verified when a block is read, as it needs to 1) decompress the data and 2) checksum it, meaning that it will take more RAM and CPU time than if we had saved a checksum of the compressed version. Reading a block occurs all the time so if there are easy optimisations there we should do them. It is important that we are able to verify the checksum on the node that reads the block from disk because it needs to detect when the block is corrupted so that it can download back a fresh version. I think the best solution would be to store the blake2 checksum of the compressed data next to it by appending it at the end of the file (we could store it in another file next to it but that means more inodes, more open/read/close calls, and overall worse performance). Appending a checksum at the end of a `.zst` file makes it technically an invalid zstd file but as long as it's documented it seems fine to me. We could also use a special file extension such as `.zstd_b2` to identify files that contain zstd data + their blake2 checksum. Concerning the code organization, I would be more comfortable if instead of having `PutBlock` and `PutCompressedBlock` we had a single `PutBlock` variant, but changed the `PutBlockMessage` to be as follows: ``` #[derive(Debug, Serialize, Deserialize)] pub struct PutBlockMessage { pub hash: Hash, pub data: BlockData, } #[derive(Debug, Serialize, Deserialize)] pub enum BlockData { Plain(#[serde(with = "serde_bytes")] Vec<u8>), Compressed(#[serde(with = "serde_bytes")] Vec<u8>), } ``` And we should try to replace `Vec<u8>` with the `BlockData` enum everywhere where it makes sense, instead of carrying a boolean `compressed` next to it.

trinity-1686a force-pushed zstd-block from 357cfb8d84 to b9ea569158

2021-04-06 18:17:36 +00:00

Compare

trinity-1686a commented

2021-04-06 19:21:52 +00:00

In case the node already have an uncompressed block, and receive a compressed version of it, it will store and use that compressed version, but not delete the uncompressed one

maximilien commented

2021-04-06 20:33:56 +00:00

For what it's worth AFAIK ZFS checksum the compressed data

lx requested changes 2021-04-06 21:53:03 +00:00

Dismissed

lx left a comment

Just some small things

src/model/block.rs Outdated

					
				@ -150,0 +163,4 @@

							BlockData::Plain(b) => (b, None),

							BlockData::Compressed(b) => {

								let checksum = blake2sum(&b);

								(b, Some(checksum))

lx commented

2021-04-06 21:46:03 +00:00

Maybe we should calculate this checksum on the sending side, i.e. in rpc_put_block, and include it in BlockData::Compressed? Meaning we do it only once, like we do for the uncompressed checksum

Maybe we should calculate this checksum on the sending side, i.e. in `rpc_put_block`, and include it in `BlockData::Compressed`? Meaning we do it only once, like we do for the uncompressed checksum

lx marked this conversation as resolved

src/model/block.rs Outdated

					
				@ -150,0 +167,4 @@

							}

						};

						let _lock = self.data_dir_lock.lock().await;

lx commented

2021-04-06 21:43:33 +00:00

Taking the lock should probably be the first thing in the function because is_block_compressed manipulates the data directory

Taking the lock should probably be the first thing in the function because `is_block_compressed` manipulates the data directory

lx marked this conversation as resolved

trinity-1686a commented

2021-04-06 22:40:10 +00:00

@maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway.

lx commented

2021-04-07 07:35:05 +00:00

@maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway.

If we could do that it would be the best solution indeed.

> @maximilien (I assume you meant Zstd) apparently it can, but does not by default. The helper used does not allow to use checksum, but a 6 lines function could. That would mean decompressing on sending side just to verify integrity, but from what I read, zstd decompression should be faster than blake2 anyway. If we could do that it would be the best solution indeed.

trinity-1686a force-pushed zstd-block from dd3ef3ebc8 to d4fd074000

2021-04-08 13:52:55 +00:00

Compare

lx dismissed lx's review 2021-04-14 21:26:56 +00:00

lx merged commit 307858bf0a into main

2021-04-14 21:27:36 +00:00

lx referenced this pull request from a commit

2021-04-14 21:27:37 +00:00

Merge pull request 'Compress with zstd' (#44) from trinity-1686a/garage:zstd-block into main

trinity-1686a deleted branch zstd-block

2021-04-19 17:48:30 +00:00

Sign in to join this conversation.

No reviewers

No milestone

No project

No assignees

3 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#44

Rows
Columns