netapp.rs: Set SO_REUSEPORT on listen_addr with TcpSocket. #9

Closed
jpds wants to merge 11 commits from jpds/netapp:so-reuseport into main
Contributor
No description provided.
jpds added 1 commit 2023-05-21 13:25:16 +00:00
Owner

Thanks for the patch, but can you explain what this patch does and why it is needed? This won't enable several Garage processes to run on the same port as there can be only one Garage process running (for each node id) due to exclusive locking on the metadata database, and it might even be the cause of confusion/wierd bugs if it somehow allows different Garage processes on the same port that do not handle incomming connections in the same way (e.g. different versions or different node ids). SO_REUSEPORT was intentionnally designed for better load-balancing between several server processes, but that's not something we are doing with Garage.

Thanks for the patch, but can you explain what this patch does and why it is needed? This won't enable several Garage processes to run on the same port as there can be only one Garage process running (for each node id) due to exclusive locking on the metadata database, and it might even be the cause of confusion/wierd bugs if it somehow allows different Garage processes on the same port that do not handle incomming connections in the same way (e.g. different versions or different node ids). SO_REUSEPORT was intentionnally designed for better load-balancing between several server processes, but that's not something we are doing with Garage.
Author
Contributor

SO_REUSEPORT was intentionnally designed for better load-balancing between several server processes

The documentation is a bit unclear, but I believe it also provides load-balancing across threads as well for TCP: https://man7.org/linux/man-pages/man7/socket.7.html

When enabled, it uses a hashing scheme to distribute load for different connections: https://kb.isc.org/docs/bind-option-reuseport

Caddy also enables this, and similarly has a single process but multiple threads on different CPUs (the kqread goes to a different CPU ID with every curl request I do):

USER   PID %CPU %MEM     VSZ     RSS TT  STAT STARTED     TIME COMMAND          UID  PPID C PRI NI MWCHAN
caddy 11477  0.0  0.5  753372   84868  6- SJ   21:21    0:53.57 /usr/local/bin/c   0     1 2  20  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- SJ   21:21    1:02.68 /usr/local/bin/c   0     1 0  20  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- IJ   21:21    0:50.91 /usr/local/bin/c   0     1 1  21  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- IJ   21:21    0:00.00 /usr/local/bin/c   0     1 2  52  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- IJ   21:21    0:00.00 /usr/local/bin/c   0     1 1  52  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- SJ   21:21    0:43.11 /usr/local/bin/c   0     1 3  20  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- SJ   21:21    0:59.88 /usr/local/bin/c   0     1 2  20  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- IJ   21:21    0:00.00 /usr/local/bin/c   0     1 3  52  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- SJ   21:21    0:56.70 /usr/local/bin/c   0     1 3  20  0 kqread
caddy 11477  0.0  0.5  753372   84868  6- IJ   21:21    0:29.17 /usr/local/bin/c   0     1 2  27  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- IJ   21:21    0:20.17 /usr/local/bin/c   0     1 2  23  0 uwait
caddy 11477  0.0  0.5  753372   84868  6- SJ   21:21    0:19.36 /usr/local/bin/c   0     1 1  20  0 uwait
> SO_REUSEPORT was intentionnally designed for better load-balancing between several server processes The documentation is a bit unclear, but I believe it also provides load-balancing across threads as well for TCP: https://man7.org/linux/man-pages/man7/socket.7.html When enabled, it uses a hashing scheme to distribute load for different connections: https://kb.isc.org/docs/bind-option-reuseport Caddy also enables this, and similarly has a single process but multiple threads on different CPUs (the `kqread` goes to a different CPU ID with every `curl` request I do): ``` USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND UID PPID C PRI NI MWCHAN caddy 11477 0.0 0.5 753372 84868 6- SJ 21:21 0:53.57 /usr/local/bin/c 0 1 2 20 0 uwait caddy 11477 0.0 0.5 753372 84868 6- SJ 21:21 1:02.68 /usr/local/bin/c 0 1 0 20 0 uwait caddy 11477 0.0 0.5 753372 84868 6- IJ 21:21 0:50.91 /usr/local/bin/c 0 1 1 21 0 uwait caddy 11477 0.0 0.5 753372 84868 6- IJ 21:21 0:00.00 /usr/local/bin/c 0 1 2 52 0 uwait caddy 11477 0.0 0.5 753372 84868 6- IJ 21:21 0:00.00 /usr/local/bin/c 0 1 1 52 0 uwait caddy 11477 0.0 0.5 753372 84868 6- SJ 21:21 0:43.11 /usr/local/bin/c 0 1 3 20 0 uwait caddy 11477 0.0 0.5 753372 84868 6- SJ 21:21 0:59.88 /usr/local/bin/c 0 1 2 20 0 uwait caddy 11477 0.0 0.5 753372 84868 6- IJ 21:21 0:00.00 /usr/local/bin/c 0 1 3 52 0 uwait caddy 11477 0.0 0.5 753372 84868 6- SJ 21:21 0:56.70 /usr/local/bin/c 0 1 3 20 0 kqread caddy 11477 0.0 0.5 753372 84868 6- IJ 21:21 0:29.17 /usr/local/bin/c 0 1 2 27 0 uwait caddy 11477 0.0 0.5 753372 84868 6- IJ 21:21 0:20.17 /usr/local/bin/c 0 1 2 23 0 uwait caddy 11477 0.0 0.5 753372 84868 6- SJ 21:21 0:19.36 /usr/local/bin/c 0 1 1 20 0 uwait ```
Owner

But that's only relevant for many short-lived connections such as HTTP requests. Netapp uses a small number of long-lived TCP connections, so this wouldn't help much. Also, I think even with SO_REUSEPORT it wouldn't change anything for the Tokio listener because we're still using a single socket and accept() calls are still made in a single thread, unless special code is added to do effective load-balancing accross threads.

If you find any convincing evidence that this changes something performance-wise for Netapp or Garage, feel free to ping me here again, but for now I'm closing this PR. Thanks for the proposal though.

But that's only relevant for many short-lived connections such as HTTP requests. Netapp uses a small number of long-lived TCP connections, so this wouldn't help much. Also, I think even with SO_REUSEPORT it wouldn't change anything for the Tokio listener because we're still using a single socket and `accept()` calls are still made in a single thread, unless special code is added to do effective load-balancing accross threads. If you find any convincing evidence that this changes something performance-wise for Netapp or Garage, feel free to ping me here again, but for now I'm closing this PR. Thanks for the proposal though.
lx closed this pull request 2023-05-24 13:35:56 +00:00
Some checks reported errors
continuous-integration/drone/pr Build was killed

Pull request closed

Sign in to join this conversation.
No reviewers
No Label
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: lx/netapp#9
No description provided.