Implement ListMultipartUploads #171
No reviewers
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#171
Loading…
Reference in a new issue
No description provided.
Delete branch "feature/s3-multipart-compat"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Implement ListMultipartUploads.
This is also a refactor of ListObjects and ListObjectsV2.
It took me some times as I wanted to propose the following things:
s3api
. In the future, I would like to move them in Rust with the aws rust SDK.Merging of the logic of ListMultipartUploads and ListObjects was not a goal but a consequence of the previous modifications.
Some points that we might want to discuss:
aws s3api
due to the fact that it can not use our optimization to skip the whole prefix. It is independant from my refactor and can be tested with the commenteds3api
tests intest-smoke.sh
. It probably has the same weird behavior on the official AWS S3 implementation.upload-id-marker
with the hardcoded valueinclude
to emulate your "including" token.941ae0e12a
toe79cfbd330
WIP: feature/s3-multipart-compatto Implement ListMultipartUploadse79cfbd330
toa5662bc998
LGTM.
There is one thing I'm worried about, this hack:
What if the last character in the key is an utf8 character that cannot be cast into
u8
? I think either the case will panic, the+
will either overflow or the+
will throw a panic. In either cases, I think we are doing things wrong.My proposed solution would be to use
Vec<u8>
everywhere for cursors instead of strings, and have akey_after_prefix
that looks like this:Here we are sure that tail is a byte that is not
0xFF
(because no unicode character contains a0xFF
in its unicode representation), therefore adding 1 is sure to be a validu8
value.To make this work, we need an adapted
get_range
so that the start position can also be a rawvec<u8>
that contains arbitrary bytes for the sort key. (If we look how keys are stored in Sled: Objects have akey
field that is a String, but in Sled they are interpreted as raw bytes and stored in the tree in the order of byte strings, and not in the dictionary order of unicode strings. Currentlyget_range
expects a starting position that is the underlying sort key type of the entry type, i.e. String in the case of Object, and it does itself the conversion to underlying bytes to get the start entry in the Sled tree. But we could very simply have a variant that takes the start key as raw bytes instead)Otherwise, I didn't see anything obviously wrong, and the unit tests seem to suggest that things are fine. Now we have to continue testing the code by using it.
I was wondering if we really need to return
<Owner>
and<Initiator>
for multipart uploads: we don't return the<Owner>
in ListObjects and that doesn't seem to be an issue for the moment. And I find it wierd to return fake data that looks plausible: here we are returning the info of the key that is making the query, but that means that different users will see different owners and initiators, which might confuse them if they believe this is true data stored in the bucket. So I'd either remove returning this, or replace it by something that is obviously dummy data (e.g.GKdummykey
or something) and that is constant between calls.A few remarks of style:
ListCommonQuery
should be calledListQueryCommon
(better English)pub com: ListCommonQuery
intopub common: ListCommonQuery
in application of the following non-written rule: only use abbreviated words as identifiers in local variables, never in function names, struct names, or struct fields.Option<String>
inExtractionResult::SkipTo
andRangeBegin::IncludingKey
; I figured it's the last key we know in the bucket that is strictly smaller than the first key we want to get, if we know it, but that's not obvious from reading the codeparse
function should get a more explicit name, for example:drive_list_query
a5662bc998
tocccc59d544
cccc59d544
to7fe2aacaad
7fe2aacaad
toccdf6e4369
ccdf6e4369
to22a7f82de1
22a7f82de1
tod0de9c7211
d0de9c7211
toc3b6dd9347
I have implemented all your suggestions:
key_after_prefix
in c3b6dd9347d6e54ed50feb4180163bbda1c2a06f + some unit tests for this function, including tests for utf8 surrogates and max char.Once you have reviewed my changes and the CI is green, you can merge the PR :-)
Implement ListMultipartUploadsto WIP: Implement ListMultipartUploadsc3b6dd9347
tofd83a2f8ad
fd83a2f8ad
to939f8d3e75
939f8d3e75
to3bbc263027
WIP: Implement ListMultipartUploadsto Implement ListMultipartUploads