Fix ping timeout and interval #4

Merged
lx merged 8 commits from fix-ping into main 2022-09-02 12:22:57 +00:00
Owner

We had issues before where ping timeouts were causing disconnections between nodes too easily. This was due to the combination of two problems:

  • a too low ping timeout (5 seconds)
  • a bug where several pings were sent in rapid succession, instead of waiting for PING_INTERVAL time to be elapsed, when the ping wasn't returned before the loop interval (1s) of the main loop

This PR upgrades the ping timeout to 10 seconds, makes sure that no ping is re-sent before 15 seconds after the previous one, and sets the number of consecutive failures before disconnection to 4 instead of 3. In an mknet simulation with very limited bandwidth, this allowed for more concurrent traffic before timeouts caused a disconnection. Disconnections still happenned in the simulation, because kernel packet buffers allowed for traffic to have up to 24 seconds RTT, which still triggers the 10 seconds ping timeout repetitively.

Question: in a real-world network, taking into account the rate at which TCP throttles sends on a slow connection, what is the expected maximum RTT on a saturated link? This should guide us to set the correct ping timeout in the definitive version.

We had issues before where ping timeouts were causing disconnections between nodes too easily. This was due to the combination of two problems: - a too low ping timeout (5 seconds) - a bug where several pings were sent in rapid succession, instead of waiting for PING_INTERVAL time to be elapsed, when the ping wasn't returned before the loop interval (1s) of the main loop This PR upgrades the ping timeout to 10 seconds, makes sure that no ping is re-sent before 15 seconds after the previous one, and sets the number of consecutive failures before disconnection to 4 instead of 3. In an mknet simulation with very limited bandwidth, this allowed for more concurrent traffic before timeouts caused a disconnection. Disconnections still happenned in the simulation, because kernel packet buffers allowed for traffic to have up to 24 seconds RTT, which still triggers the 10 seconds ping timeout repetitively. **Question:** in a real-world network, taking into account the rate at which TCP throttles sends on a slow connection, what is the expected maximum RTT on a saturated link? This should guide us to set the correct ping timeout in the definitive version.
lx added 5 commits 2022-08-31 14:49:31 +00:00
continuous-integration/drone/push Build was killed Details
700f783956
Add dump of sending queue
continuous-integration/drone/push Build was killed Details
01db3c4319
add debug_name in proto to differenciate messages
continuous-integration/drone/push Build was killed Details
984ba65e65
Better messages in proto.rs
continuous-integration/drone/push Build was killed Details
continuous-integration/drone/pr Build was killed Details
7703659742
Be more lenient on pings
lx added 1 commit 2022-08-31 15:04:53 +00:00
continuous-integration/drone/push Build was killed Details
continuous-integration/drone/pr Build was killed Details
d75146fb81
SVR -> SRV
lx added 1 commit 2022-09-02 12:02:12 +00:00
continuous-integration/drone/push Build was killed Details
continuous-integration/drone/pr Build was killed Details
c865cc9f9c
Merge branch 'main' into fix-ping
lx added 1 commit 2022-09-02 12:21:53 +00:00
continuous-integration/drone/push Build was killed Details
continuous-integration/drone/pr Build was killed Details
continuous-integration/drone/tag Build was killed Details
ca25331d73
Bump to v0.4.5
Author
Owner

Publishing this as v0.4.5

Publishing this as v0.4.5
lx merged commit a82700c5a2 into main 2022-09-02 12:22:57 +00:00
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: lx/netapp#4
No description provided.