When one node is running behind NAT, the cluster gets confused with wrong rpc_public_addr, not respecting setting file #558
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#558
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Though yes running behind NAT should be avoid and maybe netmaker is the best way to go, but I just give it a quick try.
When one node (internet ip address for example is 202.10.2.1) is behind NAT and use frp to do the reverse proxy, the frp server's IP is for example 56.18.2.1, so
every other nodes' rpc port is 3901 with their own public IP address.
This node behind NAT has the rpc_public_addr set to 56.18.2.1:4901
At the beginning everything seems to be fine and data is balancing onto this node.
After hald a day, one node drops the whole cluster communication and itself became role unassinged, reboot this vps fix it.
One node says that node behind NAT's rpc_public_addr is 202.10.2.1:3901 instead of what I set in garage.toml which is 56.18.2.1:4901. It does not respect the garage.toml setting.
One node says that node behind NAT's rpc_public_addr is 127.0.0.1:3901, it does not respect the garage.toml setting either.
Looks like the node behind NAT is trying to advertise itself with either its internet public address or 127.0.0.1 with port 3901, and ignore what I've set in garage.toml , when communication with other nodes became unstable or hard. But I would expect it to stick with the setting file, not advertise itself on the ip I did not set.
To get the node working again, the solution is to do garage node connect both ways so the other node can learn the new ip of this node.
However the node getting the internet gateway ip instead of its own specified bind ip might be a bug
lx referenced this issue2024-03-01 14:14:56 +00:00