Fix ping timeout and interval #4

Merged
lx merged 8 commits from fix-ping into main 2022-09-02 12:22:57 +00:00
Owner

We had issues before where ping timeouts were causing disconnections between nodes too easily. This was due to the combination of two problems:

  • a too low ping timeout (5 seconds)
  • a bug where several pings were sent in rapid succession, instead of waiting for PING_INTERVAL time to be elapsed, when the ping wasn't returned before the loop interval (1s) of the main loop

This PR upgrades the ping timeout to 10 seconds, makes sure that no ping is re-sent before 15 seconds after the previous one, and sets the number of consecutive failures before disconnection to 4 instead of 3. In an mknet simulation with very limited bandwidth, this allowed for more concurrent traffic before timeouts caused a disconnection. Disconnections still happenned in the simulation, because kernel packet buffers allowed for traffic to have up to 24 seconds RTT, which still triggers the 10 seconds ping timeout repetitively.

Question: in a real-world network, taking into account the rate at which TCP throttles sends on a slow connection, what is the expected maximum RTT on a saturated link? This should guide us to set the correct ping timeout in the definitive version.

We had issues before where ping timeouts were causing disconnections between nodes too easily. This was due to the combination of two problems: - a too low ping timeout (5 seconds) - a bug where several pings were sent in rapid succession, instead of waiting for PING_INTERVAL time to be elapsed, when the ping wasn't returned before the loop interval (1s) of the main loop This PR upgrades the ping timeout to 10 seconds, makes sure that no ping is re-sent before 15 seconds after the previous one, and sets the number of consecutive failures before disconnection to 4 instead of 3. In an mknet simulation with very limited bandwidth, this allowed for more concurrent traffic before timeouts caused a disconnection. Disconnections still happenned in the simulation, because kernel packet buffers allowed for traffic to have up to 24 seconds RTT, which still triggers the 10 seconds ping timeout repetitively. **Question:** in a real-world network, taking into account the rate at which TCP throttles sends on a slow connection, what is the expected maximum RTT on a saturated link? This should guide us to set the correct ping timeout in the definitive version.
lx added 5 commits 2022-08-31 14:49:31 +00:00
Add dump of sending queue
Some checks reported errors
continuous-integration/drone/push Build was killed
700f783956
add debug_name in proto to differenciate messages
Some checks reported errors
continuous-integration/drone/push Build was killed
01db3c4319
Better messages in proto.rs
Some checks reported errors
continuous-integration/drone/push Build was killed
984ba65e65
Be more lenient on pings
Some checks reported errors
continuous-integration/drone/push Build was killed
continuous-integration/drone/pr Build was killed
7703659742
lx added 1 commit 2022-08-31 15:04:53 +00:00
SVR -> SRV
Some checks reported errors
continuous-integration/drone/push Build was killed
continuous-integration/drone/pr Build was killed
d75146fb81
lx added 1 commit 2022-09-02 12:02:12 +00:00
Merge branch 'main' into fix-ping
Some checks reported errors
continuous-integration/drone/push Build was killed
continuous-integration/drone/pr Build was killed
c865cc9f9c
lx added 1 commit 2022-09-02 12:21:53 +00:00
Bump to v0.4.5
Some checks reported errors
continuous-integration/drone/push Build was killed
continuous-integration/drone/pr Build was killed
continuous-integration/drone/tag Build was killed
ca25331d73
Author
Owner

Publishing this as v0.4.5

Publishing this as v0.4.5
lx merged commit a82700c5a2 into main 2022-09-02 12:22:57 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lx/netapp#4
No description provided.