Ensure increasing version timestamps when writing new object versions #543
No reviewers
Labels
No Label
AdminAPI
Bug
Check AWS
CI
Correctness
Critical
Documentation
Ideas
Improvement
Low priority
Newcomer
Performance
S3 Compatibility
Testing
Usability
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#543
Loading…
Reference in New Issue
No description provided.
Delete Branch "increasing-timestamps"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Jepsen testing proved that S3 PutObject/GetObject is not linearizable in Garage. That's not an issue, as it was never intended to be. This PR does not fix this.
However there are some properties that objects are supposed to respect, and in particular, that writing a new object version does in fact overwrite older versions. More precisely, in the following sequence of events: WRITE(X=A); WRITE(X=B); READ(X), the last READ(X) will return B.
Jepsen testing showed that in current Garage versions, this was not always the case, due to the fact that the generated timestamps are not always in increasing order:
the version returned by a GetObject is the one with the highest internal timestamp
if the clocks on two nodes are diverging, a node B might do a write strictly after a node A but in it's local time it thinks that that write is happening before
previously, there was no check that the timestamp was strictly higher than the timestamp of the versions that have been inserted before
This PR adds the following fixes:
In PutObject, the previous version is fetched when generating the timestamp, to ensure that the new timestamp is larger than any version that was there before. This adds one RPC RTT to PutObject, but Garage tries to do that RPC in parallel with receiving the object's data, to minimize the time it takes: on large objects, the difference should be invisible.
Similar logic is also added to CreateMultipartUpload
In DeleteObject, a delete marker is always added, with an increasing timestamp as well
Jepsen testing showed that this improved the situation.
d98bf45be6
tod39c5c6984
d39c5c6984
to03490d41d5
ad4a793262
tocb5199aed0
cb5199aed0
to3d6ed63824
WIP: Ensure increasing version timestamps in PutObjectto WIP: Ensure increasing version timestamps when writing new object versionsWIP: Ensure increasing version timestamps when writing new object versionsto Ensure increasing version timestamps when writing new object versions