garage status keeps displaying old IP for itself #761

Closed
opened 2024-03-07 09:49:24 +00:00 by vk · 3 comments

Hello,

I have changed the network addressing of all the nodes in my setup, from 10.3.x.x/24 to 10.66.x.x/24.
All *_bind_addr, *_public_addr and bootstrap_peers were updated and nodes restarted. There is not a single reference to 10.3.x.x remaining in any node's garage.toml.
The network interface that was assigned the 10.3.x.x address was removed from the system.

Everything works correctly (all nodes are interconnected and work flawlessly), however, garage status on each nodes keeps displaying the old 10.3.x.x address for itself, while correctly displaying the new 10.66.x.x address for the other nodes.

On the other hand garage node id correctly displays its new 10.66.x.x address.

[e9aa12f5xxxxxxxx ~]$ garage --help
garage cargo:0.9.0 [features: k2v, sled, lmdb, sqlite, metrics, bundled-libs]
...

[e9aa12f5xxxxxxxx ~]$ garage status
==== HEALTHY NODES ====
ID                Hostname                           Address         Tags          Zone     Capacity   DataAvail
20cc099cxxxxxxxx  xxxxxxxxxxxxxxxxxxxxxxxxxxxx       10.66.4.3:3901  [xxxxx]       XXX      4.0 TB     12.0 TB (99.7%)
e9aa12f5xxxxxxxx  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  10.3.1.1:3901   [xxxxxxxxxx]  XXX      1000.0 GB  1.3 TB (99.2%)
77f0e939xxxxxxxx  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   10.66.1.3:3901  [xxxxxxxxx]   XXXXXXX  4.0 TB     10.8 TB (99.7%)
73720c1dxxxxxxxx  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  10.66.3.3:3901  [xxxxxxxxxx]  XXX      2.0 TB     3.6 TB (99.5%)

[e9aa12f5xxxxxxxx ~]$ garage node id
e9aa12f5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@10.66.2.3:3901
...

[e9aa12f5xxxxxxxx ~]$ netstat
Active Internet connections
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)
tcp4       0      0 10.66.2.3.3901         10.66.4.3.15237        ESTABLISHED
tcp4       0      0 10.66.2.3.3901         10.66.4.3.58639        ESTABLISHED
tcp4       0      0 10.66.2.3.3901         10.66.1.3.37319        ESTABLISHED
tcp4       0      0 10.66.2.3.3901         10.66.1.3.54914        ESTABLISHED
tcp4       0      0 10.66.2.3.3901         10.66.3.3.14538        ESTABLISHED
tcp4       0      0 10.66.2.3.3901         10.66.3.3.32042        ESTABLISHED
tcp4       0      0 10.66.2.3.43773        10.66.4.3.3901         ESTABLISHED
tcp4       0      0 10.66.2.3.25446        10.66.3.3.3901         ESTABLISHED
tcp4       0      0 10.66.2.3.13243        10.66.3.3.3901         ESTABLISHED
tcp4       0      0 10.66.2.3.20198        10.66.4.3.3901         ESTABLISHED
tcp4       0      0 10.66.2.3.42552        10.66.1.3.3901         ESTABLISHED
tcp4       0      0 10.66.2.3.27362        10.66.1.3.3901         ESTABLISHED
...

Same occurs with garage status on all other nodes: remote node addresses are displayed correctly, while the node itself lists its own, old, 10.3.x.x address.

While this seems to be a very minor issue, i'd like to fix it.
I couldn't find anything related to this issue in the documentation nor here in the gitea issues.

Any ideas on how to fix this?

Thanks,

Hello, I have changed the network addressing of all the nodes in my setup, from `10.3.x.x/24` to `10.66.x.x/24`. All `*_bind_addr`, `*_public_addr` and `bootstrap_peers` were updated and nodes restarted. There is not a single reference to `10.3.x.x` remaining in any node's `garage.toml`. The network interface that was assigned the `10.3.x.x` address was removed from the system. Everything works correctly (all nodes are interconnected and work flawlessly), however, `garage status` on each nodes keeps displaying the old `10.3.x.x` address for itself, while correctly displaying the new `10.66.x.x` address for the other nodes. On the other hand `garage node id` correctly displays its new `10.66.x.x` address. ``` [e9aa12f5xxxxxxxx ~]$ garage --help garage cargo:0.9.0 [features: k2v, sled, lmdb, sqlite, metrics, bundled-libs] ... [e9aa12f5xxxxxxxx ~]$ garage status ==== HEALTHY NODES ==== ID Hostname Address Tags Zone Capacity DataAvail 20cc099cxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx 10.66.4.3:3901 [xxxxx] XXX 4.0 TB 12.0 TB (99.7%) e9aa12f5xxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10.3.1.1:3901 [xxxxxxxxxx] XXX 1000.0 GB 1.3 TB (99.2%) 77f0e939xxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10.66.1.3:3901 [xxxxxxxxx] XXXXXXX 4.0 TB 10.8 TB (99.7%) 73720c1dxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10.66.3.3:3901 [xxxxxxxxxx] XXX 2.0 TB 3.6 TB (99.5%) [e9aa12f5xxxxxxxx ~]$ garage node id e9aa12f5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@10.66.2.3:3901 ... [e9aa12f5xxxxxxxx ~]$ netstat Active Internet connections Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp4 0 0 10.66.2.3.3901 10.66.4.3.15237 ESTABLISHED tcp4 0 0 10.66.2.3.3901 10.66.4.3.58639 ESTABLISHED tcp4 0 0 10.66.2.3.3901 10.66.1.3.37319 ESTABLISHED tcp4 0 0 10.66.2.3.3901 10.66.1.3.54914 ESTABLISHED tcp4 0 0 10.66.2.3.3901 10.66.3.3.14538 ESTABLISHED tcp4 0 0 10.66.2.3.3901 10.66.3.3.32042 ESTABLISHED tcp4 0 0 10.66.2.3.43773 10.66.4.3.3901 ESTABLISHED tcp4 0 0 10.66.2.3.25446 10.66.3.3.3901 ESTABLISHED tcp4 0 0 10.66.2.3.13243 10.66.3.3.3901 ESTABLISHED tcp4 0 0 10.66.2.3.20198 10.66.4.3.3901 ESTABLISHED tcp4 0 0 10.66.2.3.42552 10.66.1.3.3901 ESTABLISHED tcp4 0 0 10.66.2.3.27362 10.66.1.3.3901 ESTABLISHED ... ``` Same occurs with `garage status` on all other nodes: remote node addresses are displayed correctly, while the node itself lists its own, old, `10.3.x.x` address. While this seems to be a very minor issue, i'd like to fix it. I couldn't find anything related to this issue in the documentation nor here in the gitea issues. Any ideas on how to fix this? Thanks,
Owner

Hello, sorry for not answering this earlier.

Does the issue persist after restarting the Garage daemons?

I'm not sure exactly where the issue comes from, but you should check here:

If the issue disappears after restating the nodes, I think it's most likely due to how the peering manager learns the IP for the local node. I think it learns it once, from remote nodes (or from the config file), and doesn't ever change it after, unless the daemon is restarted.

Hello, sorry for not answering this earlier. Does the issue persist after restarting the Garage daemons? I'm not sure exactly where the issue comes from, but you should check here: - the peering manager, which is responsible for establishing connections to all nodes transitively and retrying connections when they drop: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/net/peering.rs - the membership manager, which is responsible for adding new nodes from external sources (bootstrap peers, old saved peers list, discovery) and is also in the pipeline that processes the values that are shown in `garage status`: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/rpc/system.rs If the issue disappears after restating the nodes, I think it's most likely due to how the peering manager learns the IP for the local node. I think it learns it once, from remote nodes (or from the config file), and doesn't ever change it after, unless the daemon is restarted.
lx added this to the v1.0 milestone 2024-03-21 09:54:29 +00:00
lx closed this issue 2024-03-21 10:26:37 +00:00
Author

Hello,

Does the issue persist after restarting the Garage daemons?

Actually, it seems to disappear after the node has been running for some time (ie. all nodes are displayed with up-to-date addresses), then returns after a node restart, and disappears again after a while (couldn't say for sure, but it lingers with old IP for around a day).

Hello, > Does the issue persist after restarting the Garage daemons? Actually, it seems to disappear after the node has been running for some time (ie. all nodes are displayed with up-to-date addresses), then returns after a node restart, and disappears again after a while (couldn't say for sure, but it lingers with old IP for around a day).
Owner

@vk, I would be interested if you could check if the undesirable behavior disappears in v1.0.0. In theory it should be fixed.

@vk, I would be interested if you could check if the undesirable behavior disappears in v1.0.0. In theory it should be fixed.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#761
No description provided.