Partition the key list of heavy buckets by prefix #200
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#200
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Reference
Old AWS S3
In the past, Amazon published documentation on how to name your keys to benefit from efficient partitioning.
https://stackoverflow.com/questions/38930846/performance-of-listing-s3-bucket-with-prefix-and-delimiter
New AWS S3
Apparently, they are now able to make it invisible to their end users:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
Proposed solution
Table
Currently we have this logic:
But Amazon S3, to support a very large number of keys, switch (automagically?) to an unknown number of partitions for keys, which is probably something like:
Choosing partition number + range
This is probably a change to this algorithm that enables Amazon to efficiently partition data even if you used sequential keys.
A first iteration for Garage could propose manually sharded buckets. Something like
garage bucket shard my-bucket --shards=a,b,c,d,e,f
to create 7 shards:Note on "low priority"
We are not yet aware of any case where such a scheme would be required for Garage. In other words, we have never seen any deployment of Garage that would require this optimization.
Please let us know if you encountered (or think that your will encounter) such bottleneck on Garage.
Partition key list of heavy buckets by prefixto Partition the key list of heavy buckets by prefix