Fragementation with repetitive UploadPartCopy #203

Closed
opened 2022-01-25 11:03:16 +00:00 by lx · 1 comment
Owner

When using UploadPartCopy, we can upload part number i in a multipart upload from a previously existing object, or a sub-range of such an object.

UploadPartCopy tries to preserve as most as possible the blocks of the source object. For instance, if we are copying an object that has blocks:

[ b1: 1MB ] [ b2: 1MB ] [ b3: 250kb ]

Then the created part will be composed of blocks b1, b2 and b3.

Now, if we specify a source range in the object to copy, for example 500kb-2100kb,
corresponding to the following:

[ b1: 1MB ] [ b2: 1MB ] [ b3: 250kb ]
     |----------------------|
         ^ range to copy ^

then the uploaded part will be constitued of the following blocks:

[ b1: 1MB ] [ b2: 1MB ] [ b3: 250kb ]
     |----------------------|
         ^ range to copy ^
     vvvvvvvvvvvvvvvvvvvvvvvv
     [ b1'] [ b2: 1MB ] [b3']         <- blocks constituting the new part

In other words, the blocks b1 and b3 will be split in the middle, and block b2 will be copied as-is.

Now if we use the resulting object as the source of yet another UploadPartCopy file, we will make fragmentation even worse: b1' could itself be split in the middle, same as b3', making for smaller and smaller blocks each time an existing block is split.

To avoid this, the following recommendation should be applied:

  • when specifying a source range in UploadPartCopy, make sure this range aligns with the bounds of blocks in the source object.

Unfortunately the S3 API does not give direct access to the boundaries of individual blocks (an implementation detail that is abstracted away). However with ListParts as well as with the partNumber parameter of GetObject and HeadObject, we can known the boundaries of the parts in a multipart upload. These boundaries are always block boundaries and can be safely used as bound in an UploadPartCopy source range without causing further fragmentation.

When using UploadPartCopy, we can upload part number `i` in a multipart upload from a previously existing object, or a sub-range of such an object. UploadPartCopy tries to preserve as most as possible the blocks of the source object. For instance, if we are copying an object that has blocks: ``` [ b1: 1MB ] [ b2: 1MB ] [ b3: 250kb ] ``` Then the created part will be composed of blocks `b1`, `b2` and `b3`. Now, if we specify a source range in the object to copy, for example 500kb-2100kb, corresponding to the following: ``` [ b1: 1MB ] [ b2: 1MB ] [ b3: 250kb ] |----------------------| ^ range to copy ^ ``` then the uploaded part will be constitued of the following blocks: ``` [ b1: 1MB ] [ b2: 1MB ] [ b3: 250kb ] |----------------------| ^ range to copy ^ vvvvvvvvvvvvvvvvvvvvvvvv [ b1'] [ b2: 1MB ] [b3'] <- blocks constituting the new part ``` In other words, the blocks `b1` and `b3` will be split in the middle, and block `b2` will be copied as-is. Now if we use the resulting object as the source of yet another UploadPartCopy file, we will make fragmentation even worse: `b1'` could itself be split in the middle, same as `b3'`, making for smaller and smaller blocks each time an existing block is split. To avoid this, the following recommendation should be applied: - when specifying a source range in UploadPartCopy, make sure this range aligns with the bounds of blocks in the source object. Unfortunately the S3 API does not give direct access to the boundaries of individual blocks (an implementation detail that is abstracted away). However with ListParts as well as with the partNumber parameter of GetObject and HeadObject, we can known the boundaries of the parts in a multipart upload. These boundaries are always block boundaries and can be safely used as bound in an UploadPartCopy source range without causing further fragmentation.
lx added the
Documentation
Performance
labels 2022-01-25 11:03:16 +00:00
Author
Owner

Part fragmentation happens mostly when repetitively making new multipart uploads that use copies of previous files with UploadPartCopy: each time an existing object is used as the source of an UploadPartCopy with a sub-range, some blocks (one at the beginning, one at the end) might need to be split at some point. The problem being that we do not control the place where these blocks are split, so the sub-chunks may be excessively small. After just one UploadPartCopy there isn't much issue with this because there are only at most two small sub-chunks, which is a constant overhead which can still be managed. But after repetitively rewriting an object using UploadPartCopy, many chunks will have been split at many places, and the object might be constituted of a majority of very small chunks, instead of a majority of 1MB chunks with a few very small chunks in between.

Proposed solution: add a defragmentation procedure to the code of UploadPartCopy. The first idea was to read the copied part and re-split it at 1MB boundaries, but that's costly and not required in most cases. It also breaks deduplication in many cases. What we can do instead is keep all blocks that are 1MB, and if there are several consecutive blocks that are smaller (and their total size is also less than 1MB), we re-read them and concatenate them in a single block. In a majority of cases we will not need to re-read/re-write blocks, and in the few pathological cases where consecutive blocks are split in very small pieces, it will repair that oportunistically.

Part fragmentation happens mostly when repetitively making new multipart uploads that use copies of previous files with UploadPartCopy: each time an existing object is used as the source of an UploadPartCopy with a sub-range, some blocks (one at the beginning, one at the end) might need to be split at some point. The problem being that we do not control the place where these blocks are split, so the sub-chunks may be excessively small. After just one UploadPartCopy there isn't much issue with this because there are only at most two small sub-chunks, which is a constant overhead which can still be managed. But after repetitively rewriting an object using UploadPartCopy, many chunks will have been split at many places, and the object might be constituted of a majority of very small chunks, instead of a majority of 1MB chunks with a few very small chunks in between. Proposed solution: add a defragmentation procedure to the code of UploadPartCopy. The first idea was to read the copied part and re-split it at 1MB boundaries, but that's costly and not required in most cases. It also breaks deduplication in many cases. What we can do instead is keep all blocks that are 1MB, and if there are several consecutive blocks that are smaller (and their total size is also less than 1MB), we re-read them and concatenate them in a single block. In a majority of cases we will not need to re-read/re-write blocks, and in the few pathological cases where consecutive blocks are split in very small pieces, it will repair that oportunistically.
lx closed this issue 2022-04-25 14:58:06 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#203
No description provided.