New node failing to sync after layout change #841

Closed
opened 2024-07-16 13:17:16 +00:00 by anuragbhatia · 7 comments

Had a failure in one node. Added a new fresh node to replace it and that fresh node was added successfully in the cluster from one of other existing nodes. But when I start the data sync, I see following error in the logs on the new node:

2024-07-16T13:10:59.443373Z  WARN garage_rpc::layout::version: Ring not yet ready, read/writes will be lost!
2024-07-16T13:10:59.931371Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing
2024-07-16T13:10:59.931400Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing
2024-07-16T13:10:59.931405Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing
2024-07-16T13:10:59.931408Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing
2024-07-16T13:10:59.931411Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing
2024-07-16T13:10:59.931414Z ERROR garage_rpc::layout::history: Cannot receive new layout version 8, version 7 is missing
2024-07-16T13:10:59.931416Z ERROR garage_rpc::layout::history: Cannot receive new layout version 9, version 8 is missing
2024-07-16T13:11:00.254106Z  WARN garage_rpc::layout::version: Ring not yet ready, read/writes will be lost!

This node also doesn't show layout at all but does shows the the other nodes on status command.

All nodes are running dxflrs/garage:v1.0.0 docker image.

Anyone with ideas with what can be done in this case?

Had a failure in one node. Added a new fresh node to replace it and that fresh node was added successfully in the cluster from one of other existing nodes. But when I start the data sync, I see following error in the logs on the new node: ``` 2024-07-16T13:10:59.443373Z WARN garage_rpc::layout::version: Ring not yet ready, read/writes will be lost! 2024-07-16T13:10:59.931371Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing 2024-07-16T13:10:59.931400Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing 2024-07-16T13:10:59.931405Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing 2024-07-16T13:10:59.931408Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing 2024-07-16T13:10:59.931411Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing 2024-07-16T13:10:59.931414Z ERROR garage_rpc::layout::history: Cannot receive new layout version 8, version 7 is missing 2024-07-16T13:10:59.931416Z ERROR garage_rpc::layout::history: Cannot receive new layout version 9, version 8 is missing 2024-07-16T13:11:00.254106Z WARN garage_rpc::layout::version: Ring not yet ready, read/writes will be lost! ``` This node also doesn't show layout at all but does shows the the other nodes on status command. All nodes are running dxflrs/garage:v1.0.0 docker image. Anyone with ideas with what can be done in this case?
Author

Just realised that very often following error is coming before the layout version error:

2024-07-16T13:54:09.921838Z ERROR garage_net::error: Error: ServerConn::run: Handshake error: performing handshake: failed opening client secret box
2024-07-16T13:54:09.922976Z ERROR garage_net::error: Error: ServerConn::run: Handshake error: performing handshake: failed opening client secret box
2024-07-16T13:54:09.923081Z ERROR garage_net::error: Error: ServerConn::run: Handshake error: performing handshake: failed opening client secret box
Just realised that very often following error is coming before the layout version error: ``` 2024-07-16T13:54:09.921838Z ERROR garage_net::error: Error: ServerConn::run: Handshake error: performing handshake: failed opening client secret box 2024-07-16T13:54:09.922976Z ERROR garage_net::error: Error: ServerConn::run: Handshake error: performing handshake: failed opening client secret box 2024-07-16T13:54:09.923081Z ERROR garage_net::error: Error: ServerConn::run: Handshake error: performing handshake: failed opening client secret box ```
maximilien added the
kind
wrong-behavior
label 2024-07-25 19:22:39 +00:00
Owner

Could be related to #809

Could be related to https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/809
Owner

This is a bug in the 1.0 release. When joining a new node to the cluster, the new node might take a while to get the layout history from other nodes. A quick solution is to copy the cluster_layout file in the data folder from an existing node into the new node, so it gets the layout history.

This is a bug in the 1.0 release. When joining a new node to the cluster, the new node might take a while to get the layout history from other nodes. A quick solution is to copy the `cluster_layout` file in the data folder from an existing node into the new node, so it gets the layout history.
quentin added
kind
usability
and removed
kind
wrong-behavior
labels 2024-08-07 09:37:35 +00:00
quentin added the
action
triage-required
label 2024-08-07 09:45:45 +00:00
Owner

I don't know if this issue is a full duplicate of #809, or some of these information must be transfered to #809 (for example because it makes #809 more critical). For now, I assign the triage-required flag to remember to get back to it later.

I don't know if this issue is a full duplicate of #809, or some of these information must be transfered to #809 (for example because it makes #809 more critical). For now, I assign the triage-required flag to remember to get back to it later.
quentin added the
scope
layout
label 2024-08-07 10:54:33 +00:00
quentin changed title from New node failing to sync to New node failing to sync after layout change 2024-08-07 10:55:21 +00:00

Getting the same error when adding nodes. It is more likely when I add a node and commit layer each time I add a node:

garage 2024-08-10T23:45:17.731430Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing                                                   
garage 2024-08-10T23:45:17.731458Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing                                                   
garage 2024-08-10T23:45:17.731467Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing                                                   
garage 2024-08-10T23:45:17.731475Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing                                                   
garage 2024-08-10T23:45:17.731483Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing                                                   
garage 2024-08-10T23:45:17.914986Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing                                                   
garage 2024-08-10T23:45:17.915006Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing                                                   
garage 2024-08-10T23:45:17.915015Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing                                                   
garage 2024-08-10T23:45:17.915023Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing                                                   
garage 2024-08-10T23:45:17.915031Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing                                                   
garage 2024-08-10T23:45:19.872196Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing                                                   
garage 2024-08-10T23:45:19.872221Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing                                                   
garage 2024-08-10T23:45:19.872230Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing                                                   
garage 2024-08-10T23:45:19.872238Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing                                                   
garage 2024-08-10T23:45:19.872245Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing       
Getting the same error when adding nodes. It is more likely when I add a node and commit layer each time I add a node: ``` garage 2024-08-10T23:45:17.731430Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing garage 2024-08-10T23:45:17.731458Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing garage 2024-08-10T23:45:17.731467Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing garage 2024-08-10T23:45:17.731475Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing garage 2024-08-10T23:45:17.731483Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing garage 2024-08-10T23:45:17.914986Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing garage 2024-08-10T23:45:17.915006Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing garage 2024-08-10T23:45:17.915015Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing garage 2024-08-10T23:45:17.915023Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing garage 2024-08-10T23:45:17.915031Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing garage 2024-08-10T23:45:19.872196Z ERROR garage_rpc::layout::history: Cannot receive new layout version 3, version 2 is missing garage 2024-08-10T23:45:19.872221Z ERROR garage_rpc::layout::history: Cannot receive new layout version 4, version 3 is missing garage 2024-08-10T23:45:19.872230Z ERROR garage_rpc::layout::history: Cannot receive new layout version 5, version 4 is missing garage 2024-08-10T23:45:19.872238Z ERROR garage_rpc::layout::history: Cannot receive new layout version 6, version 5 is missing garage 2024-08-10T23:45:19.872245Z ERROR garage_rpc::layout::history: Cannot receive new layout version 7, version 6 is missing ```

I have created a potential fix for this issue here: #854

I have created a potential fix for this issue here: https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/854
lx closed this issue 2024-08-24 11:12:40 +00:00

Testing latest code, not seeing the error anymore.

Testing latest code, not seeing the error anymore.
Sign in to join this conversation.
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#841
No description provided.