Upgrade without downtime #436
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#436
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently, in the documentation, our suggested upgrade procedure is to take all nodes offline for the duration of the upgrade. This is not a viable solution for production clusters that need to be continuously serving traffic. This issue covers the two aspects of solving this problem:
Concerning point 1, here are the current constraints:
Note that there is technically no need for the cluster to be taken at once for the upgrade. Here is what we could easily relax in the current migration procedure:
For the upgrade itself, and without further tools, i.e. with Garage in its current state (v0.8.0), the following two upgrade procedures are the best we can achieve:
by using dangerous mode (this reduces the guarantees on writes during the update, as they need to be validated by only one node, but should provide zero downtime if switchover is done properly): switch the cluster to replication mode 3-dangerous, reboot nodes into new version one by one or by small groups, ensuring that there is always at least one available node for each partition. For instance, in a 3-zone layout, this means upgrading all nodes in each zone, one zone after the other. In a 4-zone layout, this can be achieved by upgrading the nodes in two selected zones together, and then doing the nodes in the two remaining zones together. Once all nodes are done, put the cluster back in standard 3 replication mode. Note that changing the replication mode of the cluster itself already requires rebooting all nodes one after the other. (this cannot be done in a layout with 5 or more zones)
without dangerous mode: just reboot all nodes into the new version at the same time. This is not zero-downtime, but can probably be done with downtime < 1 minute.
The following features could be provided by Garage to ease the process:
All upgrade procedures should be tested thorougly and documented as completely as possible.
Here is another possibility: wait until all nodes are upgraded to switch to the new version.
This is similar to your (hard) mode, but leveraging both the old and new server binaries.
The specific handover mechanism can be done in multiple ways. I suggest investigating handing over the socket file descriptor over a unix domain socket. Alternatively, with
pidfd_getfd
. Or just stop listening on the old server, and start listening on the new instance, though that would incurr more downtime.The old instances could also "reverse-proxy" traffic for the new instances while waiting for the transition to be ready, so that the handover happens faster.
New connections would be performed with the new server instance, while the old instance would do its best to serve the existing ones.
I don't know much about the S3 protocol, so I am not sure if connections are long-running, can be interupted, etc (what to do if a big file is being transferred? can it be interrupted? Marked as "in use" in the new instance, etc).
While writing the above, I came across a few interesting posts:
We have tested and validated an upgrade procedure with minimal downtime, where nodes are restarted simultaneously. This can be very fast if well coordinated. There is no plan to improve further by adding complex logic into Garage itself.
I know the Issue is closed, but actually this is a no-go for any production system that needs a high uptime. garage should at least be able to support Rolling Updates.
Or even a way to have Blue/Green Deployments while for a short time there will be version 1 and 2 running with the same "database".
Something like:
is fine for cache storage, but not for a prod system.
the only way I can thing of how it would work, would be to maintain a service in front of garage and two garage clusters where the service in front saves everything that needs to be repaired on the other cluster while one is down.