remove size limitation in copy #280

Merged
lx merged 2 commits from withinboredom/garage:main into main 2022-04-19 10:49:44 +00:00
Contributor

This removes the >1mb s3_copy restriction.

This restriction doesn't seem to be documented anywhere (I could be wrong). It also causes some software to fail (such as #248).

If this restriction serves a purpose, I can change this PR to document the restriction and why it is needed.

This removes the >1mb s3_copy restriction. This restriction doesn't seem to be documented anywhere (I could be wrong). It also causes some software to fail (such as #248). If this restriction serves a purpose, I can change this PR to document the restriction and why it is needed.
Owner

Thanks for your PR and resurfacing this issue :-)

We discussed this subject some months ago, I also have a branch where I quickly disabled this check (see commit: 75597a1820) to allow someone on our Matrix channel to see if Harbor v1 works without the check. But finally this person upgraded to Harbor v2, did not have the issue anymore, and we switched to other problems.

I need to dig back in the history and also wait for the input of other contributors (in this case @lx and @trinity-1686a), but if I recall correctly there was 2 reasons for this check:

  1. compatibility with the S3 API UploadPartCopy but currently I am unable to find the part in the documentation
  2. Avoiding part fragmentation (see #203), but I am not 100% sure it applies here

I will update this issue as soon as I have more information.

Thanks for your PR and resurfacing this issue :-) We discussed this subject some months ago, I also have a branch where I quickly disabled this check (see commit: https://git.deuxfleurs.fr/Deuxfleurs/garage/commit/75597a1820ff41543ded95a22308eabf1f9c8505) to allow someone on our Matrix channel to see if Harbor v1 works without the check. But finally this person upgraded to Harbor v2, did not have the issue anymore, and we switched to other problems. I need to dig back in the history and also wait for the input of other contributors (in this case @lx and @trinity-1686a), but if I recall correctly there was 2 reasons for this check: 1. compatibility with the S3 API UploadPartCopy but currently I am unable to find the part in the documentation 2. Avoiding part fragmentation (see #203), but I am not 100% sure it applies here I will update this issue as soon as I have more information.

compatibility with the S3 API UploadPartCopy but currently I am unable to find the part in the documentation

After re-checking the documentation, this is probably us miss-interpreting AWS doc.
It says "You can copy a range only if the source object is greater than 5 MB.", which we interpreted as "you must copy at least 5MB", but this is clearly not what it actually says. The limitation is on the size of the source, not the size of the segment getting copied.

> compatibility with the S3 API UploadPartCopy but currently I am unable to find the part in the documentation After re-checking the documentation, this is probably us miss-interpreting AWS doc. It says "[You can copy a range only if the source object is greater than 5 MB.](https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html#AmazonS3-UploadPartCopy-request-header-CopySourceRange)", which we interpreted as "you must copy at least 5MB", but this is clearly not what it actually says. The limitation is on the size of the source, not the size of the segment getting copied.
Author
Contributor

But finally this person upgraded to Harbor v2, did not have the issue anymore, and we switched to other problems.

Most likely they were lucky enough to have parts that were all bigger than 1mb in their images. I was not so lucky after several months. :)

After re-checking the documentation, this is probably us miss-interpreting AWS doc.
It says "You can copy a range only if the source object is greater than 5 MB.", which we interpreted as "you must copy at least 5MB", but this is clearly not what it actually says.

I think I can give that a go in this PR. I've never written Rust before (only C#/Scala/PHP on the regular), but this may be rather straightforward.

> But finally this person upgraded to Harbor v2, did not have the issue anymore, and we switched to other problems. Most likely they were lucky enough to have parts that were all bigger than 1mb in their images. I was not so lucky after several months. :) > After re-checking the documentation, this is probably us miss-interpreting AWS doc. It says "You can copy a range only if the source object is greater than 5 MB.", which we interpreted as "you must copy at least 5MB", but this is clearly not what it actually says. I think I can give that a go in this PR. I've never written Rust before (only C#/Scala/PHP on the regular), but this may be rather straightforward.
withinboredom reviewed 2022-04-04 10:29:43 +00:00
@ -237,3 +237,3 @@
};
// Fetch source versin with its block list,
if source_version_meta.size < 5242880 {
Author
Contributor

I don't know if you'd prefer a constant here (I didn't see an existing one).

I don't know if you'd prefer a constant here (I didn't see an existing one).
Author
Contributor

I'll try and add a unit test for this as well, but no promises :) as I have no idea how they work.

I'll try and add a unit test for this as well, but no promises :) as I have no idea how they work.
withinboredom force-pushed main from 264950425d to 186d153d3b 2022-04-04 10:56:29 +00:00 Compare
Author
Contributor

I thought about a smoke test, but tricking the clients into doing a multi-part less than 5mb seems impossible. Should be ready for review.

I thought about a smoke test, but tricking the clients into doing a multi-part less than 5mb seems impossible. Should be ready for review.
Owner

Maybe you could put the number in a constant similarly to wha we have done in this file: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/rpc/rpc_helper.rs. I mean a constant at the top of the file, with a comment, split in a product (5 * 1024 * 1024) to quickly understand the final value.

We will try to merge your patch either as part of our v0.7.0 that should be released in the following days, or as part of a v0.7.1 if the timing is too short.

Thanks again for your contribution :-)

Maybe you could put the number in a constant similarly to wha we have done in this file: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/rpc/rpc_helper.rs. I mean a constant at the top of the file, with a comment, split in a product (`5 * 1024 * 1024`) to quickly understand the final value. We will try to merge your patch either as part of our v0.7.0 that should be released in the following days, or as part of a v0.7.1 if the timing is too short. Thanks again for your contribution :-)
Owner
  • The limitation on AWS that the source object should be at least 5MB looks like it's due to an internal constraint on how AWS S3 works, otherwise I don't see why they would have such a limitation. Do we really want to reproduce it?

  • One of the advantages of having a check for the minimum size of the range that is being copied is to limit fragmentation, but it can't prevent it entirely because of how UploadPartCopy works intrinsically (detailed in #203). So I'm fine with removing it.

  • The most important part is to have a warning in the documentation that UploadPartCopy can lead to performance issues, I don't remember if we currently have it or not.

- The limitation on AWS that the source object should be at least 5MB looks like it's due to an internal constraint on how AWS S3 works, otherwise I don't see why they would have such a limitation. Do we really want to reproduce it? - One of the advantages of having a check for the minimum size of the range that is being copied is to limit fragmentation, but it can't prevent it entirely because of how UploadPartCopy works intrinsically (detailed in #203). So I'm fine with removing it. - The most important part is to have a warning in the documentation that UploadPartCopy can lead to performance issues, I don't remember if we currently have it or not.
Owner

@withinboredom after discussing internally, we suggest that you remove the 5MB limit on the size of the source object, as this is an artifact of how AWS S3 works and we have no reason to impose such a limitation on Garage. We'll merge the PR once this is done.

@withinboredom after discussing internally, we suggest that you remove the 5MB limit on the size of the source object, as this is an artifact of how AWS S3 works and we have no reason to impose such a limitation on Garage. We'll merge the PR once this is done.
Owner

And also, sorry for the contradictory communication on this issue and for taking so long to make a decision!

And also, sorry for the contradictory communication on this issue and for taking so long to make a decision!
Owner

Fixes #248

Fixes #248
withinboredom force-pushed main from 75add7e133 to 2a1f2138df 2022-04-11 17:48:01 +00:00 Compare
Author
Contributor

Hi @lx! Sorry I totally missed your reply. I've rebased on main and removed the 5MB limit.

Hi @lx! Sorry I totally missed your reply. I've rebased on `main` and removed the 5MB limit.
Owner

Thanks, this is great!

Thanks, this is great!
lx merged commit a4f9f19ac3 into main 2022-04-19 10:49:44 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#280
No description provided.