Question about metadata directory #885

Closed
opened 2024-10-04 16:28:38 +00:00 by kirick · 3 comments

I have a question about how Garage nodes store metadata. In the docs, parameter metadata_dir describes metadata directory as follows:

This contains the node identifier, the network configuration and the peer list, the list of buckets and keys as well as the index of all objects, object version and object blocks.

It is unclear for me what "index of all objects" means. Does it mean "all objects on the node" or "all objects in the whole cluster"? This concerns me a lot, because if nodes store index of all objects in the cluster, it would be impossible to expand cluster by adding new servers, so I hope this is not the case.

I have a question about how Garage nodes store metadata. In the docs, parameter `metadata_dir` describes metadata directory as follows: > This contains the node identifier, the network configuration and the peer list, the list of buckets and keys as well as the index of all objects, object version and object blocks. It is unclear for me what "index of *all objects*" means. Does it mean "all objects **on the node**" or "all objects **in the whole cluster**"? This concerns me a lot, because if nodes store index of all objects in the cluster, it would be impossible to expand cluster by adding new servers, so I _hope_ this is not the case.
Owner

To clarify, the actual data (in almost any cases, there are some optimizations for small objects) is not written in this folder, only references are kept here, the actual data is stored in chunks in the data folder. I don't have have the complete rundown of what is duplicated or not, but some metadata is, as there is no leader in a garage cluster, all nodes need to have a reasonable view of the cluster state (eg. list of buckets) to be able to serve requests. This usually do not hinder scaling or adding new nodes.

To clarify, the actual data (in almost any cases, there are some optimizations for small objects) is not written in this folder, only references are kept here, the actual data is stored in chunks in the data folder. I don't have have the complete rundown of what is duplicated or not, but some metadata is, as there is no leader in a garage cluster, all nodes need to have a reasonable view of the cluster state (eg. list of buckets) to be able to serve requests. This usually do not hinder scaling or adding new nodes.
Owner

To give you an example in a cluster with a couple TiB of data the metadata folder is usually around 10GiB or less

To give you an example in a cluster with a couple TiB of data the metadata folder is usually around 10GiB or less
Owner

Concerning the "index of all objects":

  • For each bucket, there are three servers out of your entire Garage cluster that store the list of objects in this bucket. These three servers are chosen randomly for each bucket, so if you add many nodes and use many buckets, this will be spread evenly in the cluster.
  • for each object, details on this object (including list of chunks, etc) are spread evenly in the metadata directory of all cluster nodes. Every metadata entry is stored on three randomly chosen Garage nodes.
Concerning the "index of all objects": - For each bucket, there are three servers out of your entire Garage cluster that store the list of objects in this bucket. These three servers are chosen randomly for each bucket, so if you add many nodes and use many buckets, this will be spread evenly in the cluster. - for each object, details on this object (including list of chunks, etc) are spread evenly in the metadata directory of all cluster nodes. Every metadata entry is stored on three randomly chosen Garage nodes.
lx closed this issue 2024-11-19 09:28:32 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#885
No description provided.