Ensure increasing version timestamps when writing new object versions #543
No reviewers
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#543
Loading…
Reference in a new issue
No description provided.
Delete branch "increasing-timestamps"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Jepsen testing proved that S3 PutObject/GetObject is not linearizable in Garage. That's not an issue, as it was never intended to be. This PR does not fix this.
However there are some properties that objects are supposed to respect, and in particular, that writing a new object version does in fact overwrite older versions. More precisely, in the following sequence of events: WRITE(X=A); WRITE(X=B); READ(X), the last READ(X) will return B.
Jepsen testing showed that in current Garage versions, this was not always the case, due to the fact that the generated timestamps are not always in increasing order:
the version returned by a GetObject is the one with the highest internal timestamp
if the clocks on two nodes are diverging, a node B might do a write strictly after a node A but in it's local time it thinks that that write is happening before
previously, there was no check that the timestamp was strictly higher than the timestamp of the versions that have been inserted before
This PR adds the following fixes:
In PutObject, the previous version is fetched when generating the timestamp, to ensure that the new timestamp is larger than any version that was there before. This adds one RPC RTT to PutObject, but Garage tries to do that RPC in parallel with receiving the object's data, to minimize the time it takes: on large objects, the difference should be invisible.
Similar logic is also added to CreateMultipartUpload
In DeleteObject, a delete marker is always added, with an increasing timestamp as well
Jepsen testing showed that this improved the situation.
d98bf45be6
tod39c5c6984
d39c5c6984
to03490d41d5
ad4a793262
tocb5199aed0
cb5199aed0
to3d6ed63824
WIP: Ensure increasing version timestamps in PutObjectto WIP: Ensure increasing version timestamps when writing new object versionsWIP: Ensure increasing version timestamps when writing new object versionsto Ensure increasing version timestamps when writing new object versions