Document built-in caching behavior (or absence thereof) #874
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#874
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hi, I looked at the docs/website and found some references to caching website requests here:
https://garagehq.deuxfleurs.fr/documentation/cookbook/reverse-proxy/
But I haven't found anything (yet) about how caching works internally within Garage. For example, if we have a 3x replicated bucket and request a file from a garage node that does not have a copy of the object, will that garage node cache the object that it retrieves, or will it pull it over the network every time? The docs are fairly clear in that the object will be retrieved from the node itself, if available, otherwise it will be pulled, but it doesn't say anything about caching.
The features page (https://garagehq.deuxfleurs.fr/documentation/reference-manual/features/) doesn't mention caching either.
When I read the goals and use cases (https://garagehq.deuxfleurs.fr/documentation/design/goals/), one of the main goals is to operate well on geo-distributed clusters with disparate link types. I believe that in order to operate well in that kind of cluster, caching of object reads would be highly desirable as it would eliminate significant transfer over potentially slower/high latency links, particularly in use cases where reads are much more common than writes.
I think that the docs should have more detail on the caching behavior, and possibly recommendations on adding caching layers if that is the intent.
If the project owners also see a valid argument for building caching behavior into Garage (if it doesn't already exist), then it would be good to create a ticket to implement that as well.
Document built-in caching behaviorto Document built-in caching behavior (or lack thereof)Document built-in caching behavior (or lack thereof)to Document built-in caching behavior (or absence thereof)I agree that caching might be usefull in some scenarios. Our use-case is that we have nodes in 3 zones, and 3 copies of all data, so each zone contains a whole copy of eveything. This means that when a node needs to look for an object, even if it doesn't have a copy itself, there is a copy in the same zone, so we do not need to traverse high-latency links. I agree however that this is not the only scenario in which Garage can be deployed.
Caching data blocks is the first kind of caching that we could add, and it is already tracked in #179. It would be relatively straightforward to implement, as data blocks are content-addressed and therefore there is no issue with cache invalidation to handle, and it would handle the bulk of the issue as it would have the potential of reducing traffic significantly for large frequently-requested objects.
I think we do not want to go further and try to cache object metadata, as this would impact the consistency properties of Garage. This means that when fetching an object from Garage, there will always be at least one inter-zone RPC call before the Garage daemon can give its answer, so there is an incompressible latency there. If we do not do this, we incur the risk of returning old versions of objects, which is not acceptable for the S3 API, and probably not for the Web endpoint either as it is frequently used as a public access point for data programmatically added by external applications (such as media files).