WIP add content defined chunking #42
No reviewers
Labels
No Label
AdminAPI
Bug
Check AWS
CI
Correctness
Critical
Documentation
Ideas
Improvement
Low priority
Newcomer
Performance
S3 Compatibility
Testing
Usability
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#42
Loading…
Reference in New Issue
No description provided.
Delete Branch "content-defined-chunking"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Current chunking create chunks of strictly equal lenght. For deduplication purpose, it is fine as long as content is modified, without adding or removing bytes. In case a single byte is added somewhere, chunks after that won't get deduplicated.
Content Defined Chunking tries to overcome this issue by cutting based on content instead of just lenght. In case some bytes are added or removed, usually one to two chunks don't get deduplicated.
This pull request attempt to replace current chunker with FastCdc.
The pull request is marked as wip because it appears to create chunks considerably shorter than it should (with min size of 512kio, average of 1Mio and max of 2Mio, chunks are less than 600kio long). I don't know if this is due to the dataset I use, this specific chunker, or a buggy implementation
Hi trinity, thanks for the PR!
Unless I'm mistaken, it look to me that you might be feeding data twice to the chunker: when a chunk is taken from
buf
, some remaining data stays inbuf
. At the next iteration, the wholebuf
will be pushed again in the chunker, including the rest of data from the previous iteration, which was already pushed. This might explain why blocks don't have sizes consistent with the parameter of the algorithm.Side note: at the moment all developpement is going on in the
dev-0.2
branch. It shouldn't be too hard to rebase your patch on that branch. Alsodev-0.2
contains many bug fixes and improvements so it's a much better base to work on.The code definitelly looks odd, but its how the crate expect to be used based on its testsuite.
I'll rebase on
dev-0.2
, but I don't think I can change the target of a pr, so I'll close this one and open an other at next commit.Side note too : CI fail to clone branches from forked repository
7de671c48f
toa32c0bac50
Pull request closed