Ensure increasing version timestamps when writing new object versions #543

Merged
lx merged 6 commits from increasing-timestamps into main 2023-10-24 10:07:17 +00:00
Owner

Jepsen testing proved that S3 PutObject/GetObject is not linearizable in Garage. That's not an issue, as it was never intended to be. This PR does not fix this.

However there are some properties that objects are supposed to respect, and in particular, that writing a new object version does in fact overwrite older versions. More precisely, in the following sequence of events: WRITE(X=A); WRITE(X=B); READ(X), the last READ(X) will return B.

Jepsen testing showed that in current Garage versions, this was not always the case, due to the fact that the generated timestamps are not always in increasing order:

  • the version returned by a GetObject is the one with the highest internal timestamp

  • if the clocks on two nodes are diverging, a node B might do a write strictly after a node A but in it's local time it thinks that that write is happening before

  • previously, there was no check that the timestamp was strictly higher than the timestamp of the versions that have been inserted before

This PR adds the following fixes:

  • In PutObject, the previous version is fetched when generating the timestamp, to ensure that the new timestamp is larger than any version that was there before. This adds one RPC RTT to PutObject, but Garage tries to do that RPC in parallel with receiving the object's data, to minimize the time it takes: on large objects, the difference should be invisible.

  • Similar logic is also added to CreateMultipartUpload

  • In DeleteObject, a delete marker is always added, with an increasing timestamp as well

Jepsen testing showed that this improved the situation.

Jepsen testing proved that S3 PutObject/GetObject is not linearizable in Garage. That's not an issue, as it was never intended to be. This PR does not fix this. However there are some properties that objects are supposed to respect, and in particular, that writing a new object version does in fact overwrite older versions. More precisely, in the following sequence of events: WRITE(X=A); WRITE(X=B); READ(X), the last READ(X) will return B. Jepsen testing showed that in current Garage versions, this was not always the case, due to the fact that the generated timestamps are not always in increasing order: - the version returned by a GetObject is the one with the highest internal timestamp - if the clocks on two nodes are diverging, a node B might do a write strictly after a node A but in it's local time it thinks that that write is happening before - previously, there was no check that the timestamp was strictly higher than the timestamp of the versions that have been inserted before This PR adds the following fixes: - In PutObject, the previous version is fetched when generating the timestamp, to ensure that the new timestamp is larger than any version that was there before. This adds one RPC RTT to PutObject, but Garage tries to do that RPC in parallel with receiving the object's data, to minimize the time it takes: on large objects, the difference should be invisible. - Similar logic is also added to CreateMultipartUpload - In DeleteObject, a delete marker is always added, with an increasing timestamp as well Jepsen testing showed that this improved the situation.
lx force-pushed increasing-timestamps from d98bf45be6 to d39c5c6984 2023-04-18 16:05:29 +00:00 Compare
lx added the
Correctness
label 2023-04-19 10:19:06 +00:00
lx force-pushed increasing-timestamps from d39c5c6984 to 03490d41d5 2023-04-19 10:19:23 +00:00 Compare
lx changed target branch from main to next 2023-06-13 12:53:36 +00:00
lx changed target branch from next to main 2023-06-13 13:16:04 +00:00
lx force-pushed increasing-timestamps from ad4a793262 to cb5199aed0 2023-06-13 13:30:51 +00:00 Compare
lx added this to the v1.0 milestone 2023-06-14 12:23:52 +00:00
lx force-pushed increasing-timestamps from cb5199aed0 to 3d6ed63824 2023-10-18 14:38:24 +00:00 Compare
lx added 1 commit 2023-10-18 14:38:33 +00:00
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone Build is passing Details
d146cdd5b6
cargo fmt
lx added 2 commits 2023-10-20 11:37:48 +00:00
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/pr Build is passing Details
8686cfd0b1
s3 api: also ensure increasing timestamps for create_multipart_upload
lx changed title from WIP: Ensure increasing version timestamps in PutObject to WIP: Ensure increasing version timestamps when writing new object versions 2023-10-20 11:37:59 +00:00
lx added 1 commit 2023-10-20 11:56:50 +00:00
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone Build is passing Details
c82d91c6bc
DeleteObject: always insert a deletion marker with a bigger timestamp than everything before
lx changed title from WIP: Ensure increasing version timestamps when writing new object versions to Ensure increasing version timestamps when writing new object versions 2023-10-24 10:05:59 +00:00
lx merged commit 75d5d08ee1 into main 2023-10-24 10:07:17 +00:00
Sign in to join this conversation.
No description provided.