Changing IP address of a node leads to a half-connected and broken cluster #652
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#652
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I had to change the IP address of a node, so I changed both rpc_bind_addr and rpc_public_addr for this node. There's no NAT.
Old config of node A:
New config of node A:
After restarting node A, here is the status on node A, which says it's correctly connected again to node B:
But on node B, it says that node A is still disconnected:
Note how node B still has the previous IP address of node A.
When I look at the logs of node B, it even accepts the connection from node A:
But this is never reflected in the status of node B.
This issue is not transient, I waited maybe 20 minutes and nothing changes. It also prevents node B from reaching a quorum when it receives queries.
This is using Garage 0.8.4 on Debian.
I think I already had this issue, and it is generally fixed by restarting the garage daemon on other nodes.
PR #724 probably fixes the issue, it will be published with v0.9.2 / v1.0. If the issue is sill there, please reopen the issue.
lx referenced this issue2024-03-01 14:14:56 +00:00