WIP: feat: x-amz-website-redirect-location support #872

Draft
raitobezarius wants to merge 5 commits from raitobezarius/garage:redirect-meta into main
First-time contributor

Each commit contains a piece of an implementation.

Fixes #869.

Open questions

  • Do we want to validate the header field?
  • What is the behavior of the flattening in case of an HTTP(S) redirect instead of a bucket local redirect?
  • How to handle infinite loops? Disallow them? Bound the number of flattens?

GET

This is an implementation of the GET handler for the
x-amz-website-redirect-location.

This implements the semantics described in
https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-page-redirect.html
for the S3 API and the web API.

PUT

Just propagate the header.

PUT Copy

In case the metadata directive is REPLACE, we reuse the PUT propagated header.
In case the metadata directive is COPY, we probably need to check if the header is set and actually replace the header.

TODO:

  • add an integration test for PUT copy.

Multipart

TODO:

  • add an integration test for 2 multipart uploads, one redirecting the other.
  • add an integration test for 2 multipart upload copy, one redirecting the other.

Signed-off-by: Raito Bezarius masterancpp@gmail.com

Each commit contains a piece of an implementation. Fixes https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/869. ## Open questions - Do we want to validate the header field? - Amazon write in https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html that the length is 2 KB at most and needs to start by `/`, `http://` or `https://` - Should we validate if the target of redirection exist or is it OK to create speculative redirections and leave them as 404? - What is the behavior of the flattening in case of an HTTP(S) redirect instead of a bucket local redirect? - How to handle infinite loops? Disallow them? Bound the number of flattens? ### GET This is an implementation of the GET handler for the `x-amz-website-redirect-location`. This implements the semantics described in https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-page-redirect.html for the S3 API and the web API. ### PUT Just propagate the header. ### PUT Copy In case the metadata directive is REPLACE, we reuse the PUT propagated header. In case the metadata directive is COPY, we probably need to check if the header is set and actually *replace* the header. TODO: - [ ] add an integration test for PUT copy. ### Multipart TODO: - [ ] add an integration test for 2 multipart uploads, one redirecting the other. - [ ] add an integration test for 2 multipart upload copy, one redirecting the other. Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
raitobezarius force-pushed redirect-meta from bd44f702f2 to 5275ae60b0 2024-09-03 15:00:28 +00:00 Compare
lx requested changes 2024-09-05 14:14:57 +00:00
lx left a comment
Owner

I have a feeling that this does not implement the expected semantics of x-amz-website-redirect-location. As I understand by reading the AWS docs, when the header is present:

  • Fetching the object through the S3 API is not affected, the object is always returned as it exists. Its x-amz-website-redirect-location header is simply added to the HTTP headers of the response

  • Accessing the object as a web page through the web endpoint would yield an HTTP 301 with the Location header set to the value of x-amz-website-redirect-location

I think that there is no case in which S3 internally interprets the redirection, and returns the targeted object instead of the original one. In particular, this means that there is no case where multiple layers of redirection are flattened internally by S3 (and we can get rid of async_recursion).

This is not certain but it's only my interpretation of the AWS documentation, which is very badly written. What we need to do here is to test against AWS S3 to see how it behaves, and then emulate that behavior.

I have a feeling that this does not implement the expected semantics of `x-amz-website-redirect-location`. As I understand by reading the AWS docs, when the header is present: - Fetching the object through the S3 API *is not affected*, the object is always returned as it exists. Its `x-amz-website-redirect-location` header is simply added to the HTTP headers of the response - Accessing the object as a web page through the web endpoint would yield an HTTP 301 with the `Location` header set to the value of `x-amz-website-redirect-location` I think that there is no case in which S3 internally interprets the redirection, and returns the targeted object instead of the original one. In particular, this means that there is no case where multiple layers of redirection are flattened internally by S3 (and we can get rid of async_recursion). This is not certain but it's only my interpretation of the AWS documentation, which is very badly written. What we need to do here is to test against AWS S3 to see how it behaves, and then emulate that behavior.
@ -329,6 +332,30 @@ pub async fn handle_get_without_ctx(
let checksum_mode = checksum_mode(&req);
if let Some(redirect_hdr) = headers
Owner

We want this code path to trigger only for the web endpoint, x-amz-website-redirect-location should be ignored (i.e. simply returned as a normal object header) on the S3 endpoint.

We want this code path to trigger only for the web endpoint, `x-amz-website-redirect-location` should be ignored (i.e. simply returned as a normal object header) on the S3 endpoint.
@ -332,0 +335,4 @@
if let Some(redirect_hdr) = headers
.headers
.iter()
.find(|(k, _)| k == "x-amz-website-redirect-location")
Owner

Could we define a constant as follows and use that instead?

pub const X_AMZ_WEBSITE_REDIRECT_LOCATION: HeaderName = HeaderName::from_static("x-amz-website-redirect-location");
Could we define a constant as follows and use that instead? ```rust pub const X_AMZ_WEBSITE_REDIRECT_LOCATION: HeaderName = HeaderName::from_static("x-amz-website-redirect-location"); ```
@ -623,1 +623,3 @@
if name.as_str().starts_with("x-amz-meta-") {
if name.as_str().starts_with("x-amz-meta-")
|| name.as_str() == "x-amz-website-redirect-location"
{
Owner

The value of the header should be validated here to ensure it is syntactically correct, according to S3 rules

The value of the header should be validated here to ensure it is syntactically correct, according to S3 rules
Owner

Amazon write in https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html that the length is 2 KB at most and needs to start by /, http:// or https://

I think we should validate this as well.

Should we validate if the target of redirection exist or is it OK to create speculative redirections and leave them as 404?

I would expect that Amazon does not validate this, so I don't think we should validate it. We can test this on AWS if we want to be sure.

> Amazon write in https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html that the length is 2 KB at most and needs to start by /, http:// or https:// I think we should validate this as well. > Should we validate if the target of redirection exist or is it OK to create speculative redirections and leave them as 404? I would expect that Amazon does not validate this, so I don't think we should validate it. We can test this on AWS if we want to be sure.
Some checks failed
ci/woodpecker/pr/debug Pipeline failed
Required
Details
This pull request has changes conflicting with the target branch.
  • Cargo.nix
  • flake.lock
  • flake.nix
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u redirect-meta:raitobezarius-redirect-meta
git checkout raitobezarius-redirect-meta
Sign in to join this conversation.
No reviewers
lx
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#872
No description provided.