Use SRPT scheduling to improve liveness when sending messages between peers #361
Labels
No Label
AdminAPI
Bug
CI
Correctness
Critical
Documentation
Ideas
Improvement
Low priority
Newcomer
Performance
S3 Compatibility
Testing
Usability
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#361
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
Currently, according to discussions and current research, messages between Garage peers are scheduled according to the following strategy:
Round-Robin is not ideal in the second case, because when there are many concurrent messages, all messages will experience a high completion time.
A better strategy for messages with identical priorities is SRPT: Shortest Remaining Processing Time. It is designed to minimize the number of objects in the system (depending on hypotheses), and it also minimizes the average completion time.
For Garage it would look like the following:
Here are some papers on the subject (in different theoretical settings but the ideas should apply in a similar way):
Question: We are planing to moving Netapp to a streaming-based API, in which clients and servers could use async streams to send and receive data. This means that Netapp would not necessarily know how much data is remaining to send on each stream it is currently processing, as such data has not been produced yet. Do we know what is the optimal scheduling policy in the case where the remaining size is unknown?
The netapp streaming PR that was merged recently (#343) prevents the use of SRPT because we don't know the length of streams that will be sent ahead of time. Instead it uses LAS scheduling (Least Attained Service). TODO: benchmark this.
As far as I know, LAS is a good idea for long-tail distribution of flow sizes (roughly speaking: many small flows, a few very big flows) when the actual flow size is not known in advance.
Do you have an idea of the typical distribution of stream sizes in Garage? If there are many streams of the same size, I have the feeling that LAS would be similar to Round-Robin.
After discussing with LX, it seems that in practise we always know the length of a stream ahead of time. Either the user is pushing an object of an unknown size, in this case we need to buffer it to compute its hash and choose a server, or a user is fetching it, and thus we have the object in our system, and we can get its size.