Documentation on topology change #217
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#217
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We do not document topology change on Garage, only disaster recovery.
It is particularly interesting to document it as topology flexibility is one of the strength of Garage.
We should explain how someone can add multiple nodes or remove one or more nodes from the cluster.
Especially, we have some interesting stuff to say about removing a whole region at once.
An example of someone wishing storage system were more flexible: https://twitter.com/dave_universetf/status/1489483755630645248
Failure of a disk (or hot removal and replace with a blank disk) also sholld be covered (related #218)
Hi, I'm very interested to learn about the flexible topology features ; meanwhile, may you point me to the relevant code so I could understand how it works in Garage ?
Hi @Chosto , the core of this feature is implemented in
src/rpc/ring.rs
andsrc/rpc/layout.rs
. The two are separated mostly for historical reasons; nowadays the ring is just a copy of the layout data in a slightly different format that makes it easier to use.Data in Garage is divided in 256 partitions (like slices of a cake, the entirety of the data in your cluster is the whole cake but it's also called the "ring" in technical terms). Each partition has three copies: Garage builds, for each partition, a list of three nodes that store one copy, we call that the assignation of the partition to nodes. The assignation of all partitions to three nodes for each of them is what we call the layout but in the code there is also a copy of that in the ring datastructure.
Garage decides of the assignations for each partition trying to solve the following constraints:
The code in
src/rpc/layout.rs
is able to compute such an assignation. It is also able to compute a new assignation from an old one in the case where nodes are added or removed (this is where flexible topologies come in). When updating an old assignation, the code tries to minimize the number of partition copies that are moved between nodes, so as to minimize the amount of data to be transferred between nodes in order to rebalance the dataset.