Make the number of partition configurable #290

Open
opened 2022-04-09 20:05:33 +00:00 by quentin · 1 comment
Owner

Garage partitions are similar to Ceph's placement groups (PG).

PG on Ceph are configurable. They had complex formula to compute an ideal number of placement groups. Now they autoscale them according to the load.

For some large or very specific deployments, having more tha 256 partitions could be required. Also, it seems that Ceph people have largely documented what are the impacts of having more/less partitions.

Refs

Garage partitions are similar to Ceph's placement groups (PG). PG on Ceph are configurable. They had complex formula to compute an ideal number of placement groups. Now they autoscale them according to the load. For some large or very specific deployments, having more tha 256 partitions could be required. Also, it seems that Ceph people have largely documented what are the impacts of having more/less partitions. Refs - https://openstack.by/biblioteka/ceph/ceph.com/jewel/docs.ceph.com/docs/jewel/rados/operations/placement-groups.1.html - https://docs.ceph.com/en/latest/rados/operations/placement-groups/
quentin added the
Improvement
label 2022-04-09 20:05:33 +00:00
quentin added the
Low priority
label 2022-04-11 09:46:45 +00:00
Owner

There is only a single line on why Ceph actually needs the number of placements groups to stay small:

Data durability and even distribution among all OSDs call for more placement groups but their number should be reduced to the minimum to save CPU and memory

In Garage there isn't much cost associated with having more partitions. Mostly:

  • more messages exchanged when synchronizing Garage tables, probably the main reason why we don't want to increase them too much; note however that when nodes are added to the cluster, these messages get spread over the nodes, so individual nodes will have less load
  • the cluster layout will be slightly bigger; however the format is quite concise so this shouldn't be such a big issue (it uses three bytes per partition, as node IDs are abbreviated to a 1-byte number)

Note that in Garage's current state, it is impossible to change the number of partitions in a cluster, as it would imply rewriting the entire database and we don't have code to do this.

There is only a single line on why Ceph actually needs the number of placements groups to stay small: > Data durability and even distribution among all OSDs call for more placement groups but their number should be reduced to the minimum to save CPU and memory In Garage there isn't much cost associated with having more partitions. Mostly: - more messages exchanged when synchronizing Garage tables, probably the main reason why we don't want to increase them too much; note however that when nodes are added to the cluster, these messages get spread over the nodes, so individual nodes will have less load - the cluster layout will be slightly bigger; however the format is quite concise so this shouldn't be such a big issue (it uses three bytes per partition, as node IDs are abbreviated to a 1-byte number) Note that in Garage's current state, it is **impossible** to change the number of partitions in a cluster, as it would imply rewriting the entire database and we don't have code to do this.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#290
No description provided.