[next-0.10] Add migration guide for v1.0

2024-03-28 18:38:35 +01:00 · 2024-03-28 18:38:35 +01:00 · 554437254e
commit 554437254e
parent afad62939e
1 changed files with 77 additions and 0 deletions
--- a/doc/book/working-documents/migration-1.md
+++ b/doc/book/working-documents/migration-1.md
@ -0,0 +1,77 @@
 +++
 title = "Migrating from 0.9 to 1.0"
 weight = 11
 +++
 **This guide explains how to migrate to 1.0 if you have an existing 0.9 cluster.
 We don't recommend trying to migrate to 1.0 directly from 0.8 or older.**
 This migration procedure has been tested on several clusters without issues.
 However, it is still a *critical procedure* that might cause issues.
 **Make sure to back up all your data before attempting it!**
 You might also want to read our [general documentation on upgrading Garage](@/documentation/operations/upgrading.md).
 ## Changes introduced in v1.0
 The following are **breaking changes** in Garage v1.0 that require your attention when migrating:
 - The Sled metadata db engine has been **removed**. If your cluster was still
  using Sled, you will need to **use a Garage v0.9.x binary** to convert the
  database using the `garage convert-db` subcommand. See
  [here](@/documentation/reference-manual/configuration/#db_engine) for the
  details of the procedure.
 The following syntax changes have been made to the configuration file:
 - The `replication_mode` parameter has been split into two parameters:
  [`replication_factor`](@/documentation/reference-manual/configuration/#replication_factor)
  and
  [`consistency_mode`](@/documentation/reference-manual/configuration/#consistency_mode).
  The old syntax using `replication_mode` is still supported for legacy
  reasons and can still be used.
 - The parameters `sled_cache_capacity` and `sled_flush_every_ms` have been removed.
 ## Migration procedure
 The migration to Garage v1.0 can be done with almost no downtime,
 by restarting all nodes at once in the new version.
 The migration steps are as follows:
 1. Do a `garage repair --all-nodes --yes tables`, check the logs and check that
   all data seems to be synced correctly between nodes. If you have time, do
   additional `garage repair` procedures (`blocks`, `versions`, `block_refs`,
   etc.)
 2. Ensure you have a snapshot of your Garage installation that you can restore
   to in case the upgrade goes wrong:
   - If you are running Garage v0.9.4 or later, use the `garage meta snapshot
     --all` to make a backup snapshot of the metadata directories of your nodes
     for backup purposes, and save a copy of the following files in the
     metadata directories of your nodes: `cluster_layout`, `data_layout`,
     `node_key`, `node_key.pub`.
   - If you are running a filesystem such as ZFS or BTRFS that support
     snapshotting, you can create a filesystem-level snapshot to be used as a
     restoration point if needed.
   - In other cases, make a backup using the old procedure: turn off each node
     individually; back up its metadata folder (for instance, use the following
     command if your metadata directory is `/var/lib/garage/meta`: `cd
     /var/lib/garage ; tar -acf meta-v0.9.tar.zst meta/`); turn it back on
     again.  This will allow you to take a backup of all nodes without
     impacting global cluster availability.  You can do all nodes of a single
     zone at once as this does not impact the availability of Garage.
 3. Prepare your updated binaries and configuration files for Garage v1.0
 4. Shut down all v0.9 nodes simultaneously, and restart them all simultaneously
   in v1.0.  Use your favorite deployment tool (Ansible, Kubernetes, Nomad) to
   achieve this as fast as possible.  Garage v1.0 should be in a working state
   as soon as enough nodes have started.
 5. Monitor your cluster in the following hours to see if it works well under
   your production load.