Partition the key list of heavy buckets by prefix #200

Open
opened 2022-01-24 17:35:33 +00:00 by quentin · 0 comments
Owner

Reference

Old AWS S3

In the past, Amazon published documentation on how to name your keys to benefit from efficient partitioning.

Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored in UTF-8 binary ordering across multiple partitions in the index. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load, will be distributed across more than one partition.

https://stackoverflow.com/questions/38930846/performance-of-listing-s3-bucket-with-prefix-and-delimiter

New AWS S3

Apparently, they are now able to make it invisible to their end users:

For example, previously Amazon S3 performance guidelines recommended randomizing prefix naming with hashed characters to optimize performance for frequent data retrievals. You no longer have to randomize prefix naming for performance, and can use sequential date-based naming for your prefixes.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html

Proposed solution

Table

Currently we have this logic:

  • objects: (partition key: bucket, sorting key: object_key)

But Amazon S3, to support a very large number of keys, switch (automagically?) to an unknown number of partitions for keys, which is probably something like:

  • shards: (partition key: bucket, sorting key: shard)
  • objects: (partition key: shard, sorting key: object_key)

Choosing partition number + range

This is probably a change to this algorithm that enables Amazon to efficiently partition data even if you used sequential keys.

A first iteration for Garage could propose manually sharded buckets. Something like garage bucket shard my-bucket --shards=a,b,c,d,e,f to create 7 shards:

  1. Prefixes from '\x00' to 'a'
  2. Prefixes from 'a' to 'b'
  3. Prefixes from 'b' to 'c'
  4. Prefixes from 'c' to 'd'
  5. Prefixes from 'd' to 'e'
  6. Prefixes from 'e' to 'f'
  7. Prefixes from 'f' to '\xff'

Note on "low priority"

We are not yet aware of any case where such a scheme would be required for Garage. In other words, we have never seen any deployment of Garage that would require this optimization.

Please let us know if you encountered (or think that your will encounter) such bottleneck on Garage.

## Reference ### Old AWS S3 In the past, Amazon published documentation on how to name your keys to benefit from efficient partitioning. - https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/ > Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored in UTF-8 binary ordering across multiple partitions in the index. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load, will be distributed across more than one partition. https://stackoverflow.com/questions/38930846/performance-of-listing-s3-bucket-with-prefix-and-delimiter ### New AWS S3 Apparently, they are now able to make it invisible to their end users: > For example, previously Amazon S3 performance guidelines recommended randomizing prefix naming with hashed characters to optimize performance for frequent data retrievals. You no longer have to randomize prefix naming for performance, and can use sequential date-based naming for your prefixes. https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html ## Proposed solution ### Table Currently we have this logic: - objects: (partition key: bucket, sorting key: object_key) But Amazon S3, to support a very large number of keys, switch (automagically?) to an unknown number of partitions for keys, which is probably something like: - shards: (partition key: bucket, sorting key: shard) - objects: (partition key: shard, sorting key: object_key) ### Choosing partition number + range This is probably a change to this algorithm that enables Amazon to efficiently partition data even if you used sequential keys. A first iteration for Garage could propose manually sharded buckets. Something like `garage bucket shard my-bucket --shards=a,b,c,d,e,f` to create 7 shards: 1. Prefixes from '\x00' to 'a' 2. Prefixes from 'a' to 'b' 3. Prefixes from 'b' to 'c' 4. Prefixes from 'c' to 'd' 5. Prefixes from 'd' to 'e' 6. Prefixes from 'e' to 'f' 7. Prefixes from 'f' to '\xff' ## Note on "low priority" We are not yet aware of any case where such a scheme would be required for Garage. In other words, we have never seen any deployment of Garage that would require this optimization. Please let us know if you encountered (or think that your will encounter) such bottleneck on Garage.
quentin added this to the Speculative milestone 2022-01-24 17:35:33 +00:00
quentin added the
prio
low
kind
performance
labels 2022-01-24 17:35:33 +00:00
quentin changed title from Partition key list of heavy buckets by prefix to Partition the key list of heavy buckets by prefix 2022-01-24 17:37:36 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#200
No description provided.