When one node is running behind NAT, the cluster gets confused with wrong rpc_public_addr, not respecting setting file #558

Closed
opened 2023-05-04 04:28:47 +00:00 by tradingpost3 · 1 comment

Though yes running behind NAT should be avoid and maybe netmaker is the best way to go, but I just give it a quick try.

When one node (internet ip address for example is 202.10.2.1) is behind NAT and use frp to do the reverse proxy, the frp server's IP is for example 56.18.2.1, so
every other nodes' rpc port is 3901 with their own public IP address.
This node behind NAT has the rpc_public_addr set to 56.18.2.1:4901
At the beginning everything seems to be fine and data is balancing onto this node.

After hald a day, one node drops the whole cluster communication and itself became role unassinged, reboot this vps fix it.

One node says that node behind NAT's rpc_public_addr is 202.10.2.1:3901 instead of what I set in garage.toml which is 56.18.2.1:4901. It does not respect the garage.toml setting.

One node says that node behind NAT's rpc_public_addr is 127.0.0.1:3901, it does not respect the garage.toml setting either.

Looks like the node behind NAT is trying to advertise itself with either its internet public address or 127.0.0.1 with port 3901, and ignore what I've set in garage.toml , when communication with other nodes became unstable or hard. But I would expect it to stick with the setting file, not advertise itself on the ip I did not set.

Though yes running behind NAT should be avoid and maybe netmaker is the best way to go, but I just give it a quick try. When one node (internet ip address for example is 202.10.2.1) is behind NAT and use frp to do the reverse proxy, the frp server's IP is for example 56.18.2.1, so every other nodes' rpc port is 3901 with their own public IP address. This node behind NAT has the rpc_public_addr set to 56.18.2.1:4901 At the beginning everything seems to be fine and data is balancing onto this node. After hald a day, one node drops the whole cluster communication and itself became role unassinged, reboot this vps fix it. One node says that node behind NAT's rpc_public_addr is 202.10.2.1:3901 instead of what I set in garage.toml which is 56.18.2.1:4901. It does not respect the garage.toml setting. One node says that node behind NAT's rpc_public_addr is 127.0.0.1:3901, it does not respect the garage.toml setting either. Looks like the node behind NAT is trying to advertise itself with either its internet public address or 127.0.0.1 with port 3901, and ignore what I've set in garage.toml , when communication with other nodes became unstable or hard. But I would expect it to stick with the setting file, not advertise itself on the ip I did not set.
lx added the
Bug
label 2023-05-04 06:35:32 +00:00
Author

To get the node working again, the solution is to do garage node connect both ways so the other node can learn the new ip of this node.

However the node getting the internet gateway ip instead of its own specified bind ip might be a bug

To get the node working again, the solution is to do garage node connect both ways so the other node can learn the new ip of this node. However the node getting the internet gateway ip instead of its own specified bind ip might be a bug
lx added this to the v1.0 milestone 2024-02-16 10:25:11 +00:00
lx closed this issue 2024-02-19 11:44:06 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#558
No description provided.