Zone-aware data migration #483

Open
opened 2023-01-21 17:15:18 +00:00 by jpds · 0 comments
Contributor

I recently set up a garage cluster with a single node in two zones, and of course - a replication count of 2. These two nodes have a small datapipe between themselves.

I then added a single node to one of the zones (below as ccc), and garage presented this new layout:

==== NEW CLUSTER LAYOUT AFTER APPLYING CHANGES ====
ID                  Zone  Capacity
cccccccccccccccc    zone1 16
bbbbbbbbbbbbbbbb    zone2 14
aaaaaaaaaaaaaaaa    zone1 14

Target number of partitions per node:

cccccccccccccccc    186
bbbbbbbbbbbbbbbb    162
aaaaaaaaaaaaaaaa    162

New number of partitions per node:

cccccccccccccccc    137 (73% of 186)
bbbbbbbbbbbbbbbb    256 (158% of 162)
aaaaaaaaaaaaaaaa    119 (73% of 162)

Number of partitions that move:
    137 [aaaaaaaaaaaaaaaa ...] -> [cccccccccccccccc ...]

As an admin, I thought - "OK, it knows that the data is in aaa and will move that to ccc in the same zone - they have a full gigabit link and that should take no time".

However - looking at the Prometheus metrics, garage instead decided to take half the data from aaa and then the other half from bbb - which took longer due to the smaller pipe.

I think garage should prioritize moving the data from within the zone itself if that it available.

I recently set up a `garage` cluster with a single node in two zones, and of course - a replication count of `2`. These two nodes have a small datapipe between themselves. I then added a single node to one of the zones (below as `ccc`), and garage presented this new layout: ``` ==== NEW CLUSTER LAYOUT AFTER APPLYING CHANGES ==== ID Zone Capacity cccccccccccccccc zone1 16 bbbbbbbbbbbbbbbb zone2 14 aaaaaaaaaaaaaaaa zone1 14 Target number of partitions per node: cccccccccccccccc 186 bbbbbbbbbbbbbbbb 162 aaaaaaaaaaaaaaaa 162 New number of partitions per node: cccccccccccccccc 137 (73% of 186) bbbbbbbbbbbbbbbb 256 (158% of 162) aaaaaaaaaaaaaaaa 119 (73% of 162) Number of partitions that move: 137 [aaaaaaaaaaaaaaaa ...] -> [cccccccccccccccc ...] ``` As an admin, I thought - "OK, it knows that the data is in `aaa` and will move that to `ccc` in the same zone - they have a full gigabit link and that should take no time". However - looking at the Prometheus metrics, garage instead decided to take half the data from `aaa` and then the other half from `bbb` - which took longer due to the smaller pipe. I think garage should prioritize moving the data from within the zone itself if that it available.
lx added the
kind
wrong-behavior
label 2023-01-24 10:21:07 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#483
No description provided.