Buckets with different replication factor #838

Open
opened 2024-07-08 20:27:02 +00:00 by morsik · 4 comments

Hi,

I have some data I would like to store in Object Store that is not that important to be available to write anytime, but being available to read at anytime would be great - which implies that replication_factor = 2 would be enough for that usecase (no write, read-only if 1 replica fails). But currently it's impossible to set such policy per-bucket, only per-garage-deployment.

Please add such feature to allow to specify different replication factors for different buckets so users can choose how important their data it ;)

Thanks!

Hi, I have some data I would like to store in Object Store that is not that important to be available to write anytime, but being available to read at anytime would be great - which implies that replication_factor = 2 would be enough for that usecase (no write, read-only if 1 replica fails). But currently it's impossible to set such policy per-bucket, only per-garage-deployment. Please add such feature to allow to specify different replication factors for different buckets so users can choose how important their data it ;) Thanks!

An additional thought/use case to this:
This would improve storage efficency for some large bucket, which are totally reproducible and therefore need no replicas, e.g. repository mirrors.

An additional thought/use case to this: This would improve storage efficency for some large bucket, which are totally reproducible and therefore need no replicas, e.g. repository mirrors.
maximilien added the
kind
improvement
kind
ideas
labels 2024-07-24 21:46:23 +00:00
quentin added
scope
s3-api
prio
low
and removed
kind
improvement
labels 2024-08-07 09:50:59 +00:00
quentin added the
action
discussion-needed
label 2024-08-07 09:52:17 +00:00
Owner

Do you plan on configuring this with S3 Storage Class?
Do you have any idea of the steps involved to implement this feature in Garage?
How could we split this feature request in small pull requests?
What could be the drawbacks of implementing this?

Do you plan on configuring this with S3 Storage Class? Do you have any idea of the steps involved to implement this feature in Garage? How could we split this feature request in small pull requests? What could be the drawbacks of implementing this?
quentin changed title from [Feature request] Buckets with different replication factor to Buckets with different replication factor 2024-08-07 09:56:37 +00:00
Author

@quentin to be honest, not sure if you're asking me those questions, as I simply don't know how Garage works internally and how it splits the data, so I simply don't know the answer what work has to be done here :(

Regarding configuring this feature - from my perspective I don't care if this uses S3 Storage Class compatible API or it's configured via Garage's internal tool (like API keys are configured via Garage feature instead of AWS-compatible API).

@quentin to be honest, not sure if you're asking me those questions, as I simply don't know how Garage works internally and how it splits the data, so I simply don't know the answer what work has to be done here :( Regarding configuring this feature - from my perspective I don't care if this uses S3 Storage Class compatible API or it's configured via Garage's internal tool (like API keys are configured via Garage feature instead of AWS-compatible API).

Replication factor is not in the S3 API, right? If implemented, the bucket create command might be a good place to have a flag for this:

$ kubectl -n garage-hdd exec deployment/garage-cube -- ./garage bucket create --help
Defaulted container "garage" out of: garage, config-init (init)
garage-bucket-create 615698df7d2d3867fd3d9cbdf8d8849a9de358bf
Create bucket

USAGE:
    garage bucket create <name>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <name>    Bucket name

Alternatively, it could be done by deploying multiple Garage clusters.

@quentin does Garage yet have an opinion on how this should be done? What we've been doing is running multiple garage clusters as needed. In our case, we're dividing up storage using whole disks, but it seems the design of Garage might be amenable to shared storage.

I think the overall use case of needing different replication values for different data sets is valid and probably common (we have this use case as well).

IMO the project should cover this in the docs and design intent, and if the intent of Garage is that it should be done by creating multiple clusters, then ideally Garage would have good support for sharing underlying disks/filesystems with other processes (which I think it already does).

Replication factor is not in the S3 API, right? If implemented, the `bucket create` command might be a good place to have a flag for this: ``` $ kubectl -n garage-hdd exec deployment/garage-cube -- ./garage bucket create --help Defaulted container "garage" out of: garage, config-init (init) garage-bucket-create 615698df7d2d3867fd3d9cbdf8d8849a9de358bf Create bucket USAGE: garage bucket create <name> FLAGS: -h, --help Prints help information -V, --version Prints version information ARGS: <name> Bucket name ``` Alternatively, it could be done by deploying multiple Garage clusters. @quentin does Garage yet have an opinion on how this should be done? What we've been doing is running multiple garage clusters as needed. In our case, we're dividing up storage using whole disks, but it seems the design of Garage might be amenable to shared storage. I think the overall use case of needing different replication values for different data sets is valid and probably common (we have this use case as well). IMO the project should cover this in the docs and design intent, and if the intent of Garage is that it should be done by creating multiple clusters, then ideally Garage would have good support for sharing underlying disks/filesystems with other processes (which I think it already does).
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#838
No description provided.