forked from Deuxfleurs/garage
layout doc: write explanations for bizarre scenarios
This commit is contained in:
parent
405aa42b7d
commit
8d07888fa2
1 changed files with 74 additions and 0 deletions
|
@ -93,9 +93,23 @@ follow the following recommendations:
|
||||||
|
|
||||||
## Understanding unexpected layout calculations
|
## Understanding unexpected layout calculations
|
||||||
|
|
||||||
|
When adding, removing or modifying nodes in a cluster layout, sometimes
|
||||||
|
unexpected assigntations of partitions to node can occure. These assignations
|
||||||
|
are in fact normal and logical, given the objectives of the algorihtm. Indeed,
|
||||||
|
**the layout algorithm prioritizes moving less data between nodes over the fact
|
||||||
|
of achieving equal distribution of load**. This section presents two examples
|
||||||
|
and illustrates how one can control Garage's behavior to obtain the desired
|
||||||
|
results.
|
||||||
|
|
||||||
### Example 1
|
### Example 1
|
||||||
|
|
||||||
|
In this example, a cluster is originally composed of 3 nodes in 3 different
|
||||||
|
zones (data centers). The three nodes are of equal capacity, therefore they
|
||||||
|
are all fully exploited and all store a copy of all of the data in the cluster.
|
||||||
|
|
||||||
|
Then, a fourth node of the same size is added in the datacenter `dc1`.
|
||||||
|
As illustrated by the following, **Garage will by default not store any data on the new node**:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ garage layout show
|
$ garage layout show
|
||||||
==== CURRENT CLUSTER LAYOUT ====
|
==== CURRENT CLUSTER LAYOUT ====
|
||||||
|
@ -146,8 +160,37 @@ dc3 Tags Partitions Capacity Usable capacity
|
||||||
TOTAL 256 (256 unique) 1000.0 MB 1000.0 MB (100.0%)
|
TOTAL 256 (256 unique) 1000.0 MB 1000.0 MB (100.0%)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
While unexpected, this is logical because of the following facts:
|
||||||
|
|
||||||
|
- storing some data on the new node does not help increase the total quantity
|
||||||
|
of data that can be stored on the cluster, as the two other zones (`dc2` and
|
||||||
|
`dc3`) still need to store a full copy of everything, and their capacity is
|
||||||
|
still the same;
|
||||||
|
|
||||||
|
- there is therefore no need to move any data on the new node as this would be pointless;
|
||||||
|
|
||||||
|
- moving data to the new node has a cost which the algorithm decides to not pay if not necessary.
|
||||||
|
|
||||||
|
This distribution of data can however not be what the administrator wanted: if
|
||||||
|
they added a new node to `dc1`, it might be because the existing node is too
|
||||||
|
slow, and they wish to divide its load by half. In that case, what they need to
|
||||||
|
do to force Garage to distribute the data between the two nodes is to attribute
|
||||||
|
only half of the capacity to each node in `dc1` (in our example, 500M instead of 1G).
|
||||||
|
In that case, Garage would determine that to be able to store 1G in total, it
|
||||||
|
would need to store 500M on the old node and 500M on the added one.
|
||||||
|
|
||||||
|
|
||||||
### Example 2
|
### Example 2
|
||||||
|
|
||||||
|
The following example is a slightly different scenario, where `dc1` had two
|
||||||
|
nodes that were used at 50%, and `dc2` and `dc3` each have one node that is
|
||||||
|
100% used. All node capacities are the same.
|
||||||
|
|
||||||
|
Then, a node from `dc1` is moved into `dc3`. One could expect that the roles of
|
||||||
|
`dc1` and `dc3` would simply be swapped: the remaining node in `dc1` would be
|
||||||
|
used at 100%, and the two nodes now in `dc3` would be used at 50%. Instead,
|
||||||
|
this happens:
|
||||||
|
|
||||||
```
|
```
|
||||||
==== CURRENT CLUSTER LAYOUT ====
|
==== CURRENT CLUSTER LAYOUT ====
|
||||||
ID Tags Zone Capacity Usable capacity
|
ID Tags Zone Capacity Usable capacity
|
||||||
|
@ -197,3 +240,34 @@ dc3 Tags Partitions Capacity Usable capacity
|
||||||
a11c7cf18af29737 node4 63 (0 new) 1000.0 MB 246.1 MB (24.6%)
|
a11c7cf18af29737 node4 63 (0 new) 1000.0 MB 246.1 MB (24.6%)
|
||||||
TOTAL 256 (256 unique) 2.0 GB 1000.0 MB (50.0%)
|
TOTAL 256 (256 unique) 2.0 GB 1000.0 MB (50.0%)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
As we can see, the node that was moved to `dc3` (node4) is only used at 25% (approximatively),
|
||||||
|
whereas the node that was already in `dc3` (node3) is used at 75%.
|
||||||
|
|
||||||
|
This can be explained by the following:
|
||||||
|
|
||||||
|
- node1 will now be the only node remaining in `dc1`, thus it has to store all
|
||||||
|
of the data in the cluster. Since it was storing only half of it before, it has
|
||||||
|
to retrieve the other half from other nodes in the cluster.
|
||||||
|
|
||||||
|
- The data which it does not have is entirely stored by the other node that was
|
||||||
|
in `dc1` and that is now in `dc3` (node4). There is also a copy of it on node2
|
||||||
|
and node3 since both these nodes have a copy of everything.
|
||||||
|
|
||||||
|
- node3 and node4 are the two nodes that will now be in a datacenter that is
|
||||||
|
under-utilized (`dc3`), this means that those are the two candidates from which
|
||||||
|
data can be removed to be moved to node1.
|
||||||
|
|
||||||
|
- Garage will move data in equal proportions from all possible sources, in this
|
||||||
|
case it means that it will tranfer 25% of the entire data set from node3 to
|
||||||
|
node1 and another 25% from node4 to node1.
|
||||||
|
|
||||||
|
This explains why node3 ends with 75% utilization (100% from before minus 25%
|
||||||
|
that is moved to node1), and node4 ends with 25% (50% from before minus 25%
|
||||||
|
that is moved to node1).
|
||||||
|
|
||||||
|
This illustrates another principle of the layout computation: **if there is a
|
||||||
|
choice in moving data out of some nodes, then all links between pairs of nodes
|
||||||
|
are used in equal proportions** (this is approximately true, there is
|
||||||
|
randomness in the algorihtm to achieve this so there might be some small
|
||||||
|
fluctuations, as we see above).
|
||||||
|
|
Loading…
Add table
Reference in a new issue