Question: best approach for low latency #893
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#893
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hey,
I migrated from Minio over to Garage some time ago and I am very happy with it, you did awesome work on this project!
This is only a question, not an issue.
I am designing a new application that will use Garage for file storage backend for in the end millions of objects.
The main goal is to have lowest possible latency for reads. I can split the data logically in a way that I could use probably 3 or 4 different buckets and it would still make sense, but I need to change object names from time to time and would therefore need to use
copy internal
to do this.The question is, what would be the best approach from the start performance and latency wise?
copy internal
everywhereprefix
es for the files instead of using a flat structure and have all objects with a unique ID in the bucket? The question here is if Garage can internally do a more efficient and faster lookup if I split the unique object ID to a prefix based approach.I am asking this upfront because a migration to another approach later on after benchmarking when I finally have all these files on the storage could be a lot of work.
Thanks!
Edit::
The
copy internal
is not an issue anymore, it's working fine between different buckets. So there is no reason to not partition the data and separate it using different buckets.The only question left is if I would get a latency / speed advantage from path prefix vs no prefix.
I don't think there will, there is no specific prefix sharing depending on object path, it depends on the hash of the actual data. The only thing that might make a difference is the block size you configure on the cluster, as well as the size of objects.
Thank you!
I was able to do some very first, simple benchmarks and I did not notice any real difference between the two approaches so far.
A big vote for using path prefixes though is the ability to do easier debugging or resuming longer running jobs, if the main application restarts in between for instance, because I can limit
ListObjects
for a prefix, which is very helpful in these situations.So what I am doing now is sharding the data between different buckets where it is no issue (logically, all data is "the same", but I can do some rough grouping) and using path prefixes for easier maintenance. The performance is great in any case and I can super easily serve small files from S3 directly, which is awesome.
So I guess this is answered.