Correctness of table GC in asynchronous networks #151
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#151
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
PR #135 mostly fixes the CRDT-GC issue by implementing a 24-hour delay before anything is garbage collected in a table. This works under the assumption that rebalances that follow data shuffling terminate in less than 24 hours.
However, in distributed systems, it is generally considered a bad practice to make assumptions that information propagates in a certain time interval: this consists in making a synchrony assumption, meaning that we are basically assuming a computing model that has much stronger properties than otherwise. To maximize the applicability of Garage, we would like to remove this assumption, and implement a system where time does not play a role. To do this, we would need to find a way to safely disable the GC when data is being shuffled around, and safely detect that the shuffling has terminated and thus the GC can be resumed. This introduces some complexity to the protocol and hasn't been tackled yet.