Garage fails to count to 3? #597
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#597
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
For some reason, garage cannot reach a quorum despite 2 nodes being available in the region and replicas being available outside the region.
Any suggestions on how to fix this (the failed node is basically fubar -- see #595 -- and I'm trying to reach a good state again, so it will be removed completely).
Presumably, this would be an issue with the fact that
capital
andcantor
are on IPv4 - and thus have no way to communicate with any of the other nodes as they have IPv6 addresses registered.You would have to readd them to the deployment with their IPv6 addresses.
Are you using replication mode 2 or 3 ? Your logs look like you are using replication mode 2, which would explain why a single unavailable node breaks your cluster. If that's the case, try setting it to 2-dangerous to restore write capability to your cluster.
Also, if your plan is to eventually remove the failed node from the cluster, you can remove it now from the layout to rebuild copies of all your data, there is no particular reason to wait.