Can not remove node when it is involved in an unfinished intermediate layout change #809

Open
opened 2024-04-17 18:02:45 +00:00 by yuka · 1 comment
Contributor
$ garage layout history
==== LAYOUT HISTORY ====
Version  Status    Storage nodes  Gateway nodes
#24      current   3              0
#23      draining  4              0
#22      draining  3              0

==== UPDATE TRACKERS ====
Several layout versions are currently live in the version, and data is being migrated.
This is the internal data that Garage stores to know which nodes have what data.

Node              Ack  Sync  Sync_ack
...  #24  #22   #22
...  #24  #24   #22
...  #24  #22   #22
...  #24  #22   #22

If some nodes are not catching up to the latest layout version in the update trackers,
it might be because they are offline or unable to complete a sync successfully.
You may force progress using `garage layout skip-dead-nodes --version 24`

In version 23 I added a node, which I then removed again in version 24 without waiting for it to sync.
Now it's stuck in a state where the block sync queues are empty but the node can not be removed, not even with garage layout skip-dead-nodes --version 24 --allow-missing-data

$ garage layout skip-dead-nodes --version 23 
Error: Internal error: Nothing was done, try passing the `--allow-missing-data` flag to force progress even when not enough nodes can complete a metadata sync.

$ garage layout skip-dead-nodes --version 23 --allow-missing-data
Error: Internal error: Sorry, there is nothing I can do for you. Please wait patiently. If you ask for help, please send the output of the `garage layout history` command.

$ garage layout skip-dead-nodes --version 24
Error: Internal error: Nothing was done, try passing the `--allow-missing-data` flag to force progress even when not enough nodes can complete a metadata sync.

$ garage layout skip-dead-nodes --version 24 --allow-missing-data
Error: Internal error: Sorry, there is nothing I can do for you. Please wait patiently. If you ask for help, please send the output of the `garage layout history` command.

$ 
``` $ garage layout history ==== LAYOUT HISTORY ==== Version Status Storage nodes Gateway nodes #24 current 3 0 #23 draining 4 0 #22 draining 3 0 ==== UPDATE TRACKERS ==== Several layout versions are currently live in the version, and data is being migrated. This is the internal data that Garage stores to know which nodes have what data. Node Ack Sync Sync_ack ... #24 #22 #22 ... #24 #24 #22 ... #24 #22 #22 ... #24 #22 #22 If some nodes are not catching up to the latest layout version in the update trackers, it might be because they are offline or unable to complete a sync successfully. You may force progress using `garage layout skip-dead-nodes --version 24` ``` In version 23 I added a node, which I then removed again in version 24 without waiting for it to sync. Now it's stuck in a state where the block sync queues are empty but the node can not be removed, not even with `garage layout skip-dead-nodes --version 24 --allow-missing-data` ``` $ garage layout skip-dead-nodes --version 23 Error: Internal error: Nothing was done, try passing the `--allow-missing-data` flag to force progress even when not enough nodes can complete a metadata sync. $ garage layout skip-dead-nodes --version 23 --allow-missing-data Error: Internal error: Sorry, there is nothing I can do for you. Please wait patiently. If you ask for help, please send the output of the `garage layout history` command. $ garage layout skip-dead-nodes --version 24 Error: Internal error: Nothing was done, try passing the `--allow-missing-data` flag to force progress even when not enough nodes can complete a metadata sync. $ garage layout skip-dead-nodes --version 24 --allow-missing-data Error: Internal error: Sorry, there is nothing I can do for you. Please wait patiently. If you ask for help, please send the output of the `garage layout history` command. $ ```
Author
Contributor

Indeed it seems the nodes (not even the newly added node) were still busy syncing the metadata around. So bringing the new node back online and waiting long enough fixed it. I'm currently not sure if it would also have been enough to wait, without bringing the new node back online.

Indeed it seems the nodes (not even the newly added node) were still busy syncing the metadata around. So bringing the new node back online and waiting long enough fixed it. I'm currently not sure if it would also have been enough to wait, without bringing the new node back online.
lx added the
kind
wrong-behavior
label 2024-05-24 17:10:50 +00:00
lx added this to the v1.1 milestone 2024-05-24 17:10:53 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#809
No description provided.