better consistent hashing #31
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#31
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Current technique:
New technique:
partition_bits is a constant of the system that defines how many partitions of the data exist, typically partition_bits = 10 and we have 1024 partitions.
Two advantages to the new technique:
One inconvenient:
EDIT: Maglev is even better, the load distribution is perfect and it has the same other properties as the method described above
better consistentto better consistent hashingScript
simulate_ring.py
shows a simulation of four hashing algorihtms:The last one is the best, although it uses 3x more RAM since the list of replicas for a partition is stored in the partition and not obtained by walking the ring of partitions. (i.e. a partition stores 3 node IDs instead of just one, but it is totally worth it because the walking strategy was a major source of imbalance). Also note that the last method provides very good balancing for 256 partitions only, even in a test with 10 nodes, whereas methods 2 and 3 need much more partitions (typically 4096) to counter the imbalance due to the ring walking.
Mostly done. Todo: rename
n_tokens
intocapacity
Done