forked from Deuxfleurs/garage
Compare commits
22 commits
hotfix/1.0
...
main
Author | SHA1 | Date | |
---|---|---|---|
d38509ef4b | |||
|
39b37833c5 | ||
a2c1de646b | |||
15847a636a | |||
123d3e1f04 | |||
a6e4b96ca9 | |||
b442b0e35e | |||
33c2086d9e | |||
5ad1e55ccf | |||
1779fd40c0 | |||
ff093ddbb8 | |||
90e3c2af91 | |||
b47706809c | |||
126e0f47a3 | |||
738bb2f09c | |||
7dd7cb5759 | |||
8b663d8c5b | |||
c051db8204 | |||
50669b3e76 | |||
e5838b4837 | |||
87dfaf2eb9 | |||
554437254e |
13 changed files with 106 additions and 21 deletions
|
@ -300,7 +300,7 @@ Since `v0.8.0`, Garage can use alternative storage backends as follows:
|
|||
| [Sled](https://sled.rs) (old default, removed since `v1.0`) | `"sled"` | `<metadata_dir>/db/` |
|
||||
|
||||
Sled was supported until Garage v0.9.x, and was removed in Garage v1.0.
|
||||
You can still use an older binary of Garage (e.g. v0.9.3) to migrate
|
||||
You can still use an older binary of Garage (e.g. v0.9.4) to migrate
|
||||
old Sled metadata databases to another engine.
|
||||
|
||||
Performance characteristics of the different DB engines are as follows:
|
||||
|
@ -390,10 +390,12 @@ if geographical replication is used.
|
|||
|
||||
If this value is set, Garage will automatically take a snapshot of the metadata
|
||||
DB file at a regular interval and save it in the metadata directory.
|
||||
This can allow to recover from situations where the metadata DB file is corrupted,
|
||||
for instance after an unclean shutdown.
|
||||
See [this page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
||||
This parameter can take any duration string that can be parsed by
|
||||
the [`parse_duration`](https://docs.rs/parse_duration/latest/parse_duration/#syntax) crate.
|
||||
|
||||
Snapshots can allow to recover from situations where the metadata DB file is
|
||||
corrupted, for instance after an unclean shutdown. See [this
|
||||
page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
||||
Garage keeps only the two most recent snapshots of the metadata DB and deletes
|
||||
older ones automatically.
|
||||
|
||||
|
@ -412,7 +414,7 @@ month, with a random delay to avoid all nodes running at the same time. When
|
|||
it scrubs the data directory, Garage will read all of the data files stored on
|
||||
disk to check their integrity, and will rebuild any data files that it finds
|
||||
corrupted, using the remaining valid copies stored on other nodes.
|
||||
See [this page](@/documentation/operations/durability-repair.md#scrub) for details.
|
||||
See [this page](@/documentation/operations/durability-repairs.md#scrub) for details.
|
||||
|
||||
Set the `disable_scrub` configuration value to `true` if you don't need Garage
|
||||
to scrub the data directory, for instance if you are already scrubbing at the
|
||||
|
|
77
doc/book/working-documents/migration-1.md
Normal file
77
doc/book/working-documents/migration-1.md
Normal file
|
@ -0,0 +1,77 @@
|
|||
+++
|
||||
title = "Migrating from 0.9 to 1.0"
|
||||
weight = 11
|
||||
+++
|
||||
|
||||
**This guide explains how to migrate to 1.0 if you have an existing 0.9 cluster.
|
||||
We don't recommend trying to migrate to 1.0 directly from 0.8 or older.**
|
||||
|
||||
This migration procedure has been tested on several clusters without issues.
|
||||
However, it is still a *critical procedure* that might cause issues.
|
||||
**Make sure to back up all your data before attempting it!**
|
||||
|
||||
You might also want to read our [general documentation on upgrading Garage](@/documentation/operations/upgrading.md).
|
||||
|
||||
## Changes introduced in v1.0
|
||||
|
||||
The following are **breaking changes** in Garage v1.0 that require your attention when migrating:
|
||||
|
||||
- The Sled metadata db engine has been **removed**. If your cluster was still
|
||||
using Sled, you will need to **use a Garage v0.9.x binary** to convert the
|
||||
database using the `garage convert-db` subcommand. See
|
||||
[here](@/documentation/reference-manual/configuration.md#db_engine) for the
|
||||
details of the procedure.
|
||||
|
||||
The following syntax changes have been made to the configuration file:
|
||||
|
||||
- The `replication_mode` parameter has been split into two parameters:
|
||||
[`replication_factor`](@/documentation/reference-manual/configuration.md#replication_factor)
|
||||
and
|
||||
[`consistency_mode`](@/documentation/reference-manual/configuration.md#consistency_mode).
|
||||
The old syntax using `replication_mode` is still supported for legacy
|
||||
reasons and can still be used.
|
||||
|
||||
- The parameters `sled_cache_capacity` and `sled_flush_every_ms` have been removed.
|
||||
|
||||
## Migration procedure
|
||||
|
||||
The migration to Garage v1.0 can be done with almost no downtime,
|
||||
by restarting all nodes at once in the new version.
|
||||
|
||||
The migration steps are as follows:
|
||||
|
||||
1. Do a `garage repair --all-nodes --yes tables`, check the logs and check that
|
||||
all data seems to be synced correctly between nodes. If you have time, do
|
||||
additional `garage repair` procedures (`blocks`, `versions`, `block_refs`,
|
||||
etc.)
|
||||
|
||||
2. Ensure you have a snapshot of your Garage installation that you can restore
|
||||
to in case the upgrade goes wrong:
|
||||
|
||||
- If you are running Garage v0.9.4 or later, use the `garage meta snapshot
|
||||
--all` to make a backup snapshot of the metadata directories of your nodes
|
||||
for backup purposes, and save a copy of the following files in the
|
||||
metadata directories of your nodes: `cluster_layout`, `data_layout`,
|
||||
`node_key`, `node_key.pub`.
|
||||
|
||||
- If you are running a filesystem such as ZFS or BTRFS that support
|
||||
snapshotting, you can create a filesystem-level snapshot to be used as a
|
||||
restoration point if needed.
|
||||
|
||||
- In other cases, make a backup using the old procedure: turn off each node
|
||||
individually; back up its metadata folder (for instance, use the following
|
||||
command if your metadata directory is `/var/lib/garage/meta`: `cd
|
||||
/var/lib/garage ; tar -acf meta-v0.9.tar.zst meta/`); turn it back on
|
||||
again. This will allow you to take a backup of all nodes without
|
||||
impacting global cluster availability. You can do all nodes of a single
|
||||
zone at once as this does not impact the availability of Garage.
|
||||
|
||||
3. Prepare your updated binaries and configuration files for Garage v1.0
|
||||
|
||||
4. Shut down all v0.9 nodes simultaneously, and restart them all simultaneously
|
||||
in v1.0. Use your favorite deployment tool (Ansible, Kubernetes, Nomad) to
|
||||
achieve this as fast as possible. Garage v1.0 should be in a working state
|
||||
as soon as enough nodes have started.
|
||||
|
||||
5. Monitor your cluster in the following hours to see if it works well under
|
||||
your production load.
|
|
@ -76,6 +76,7 @@
|
|||
# import the full shell using `nix develop .#full`
|
||||
full = shellWithPackages (with pkgs; [
|
||||
rustfmt
|
||||
rust-analyzer
|
||||
clang
|
||||
mold
|
||||
# ---- extra packages for dev tasks ----
|
||||
|
|
|
@ -15,7 +15,7 @@ type: application
|
|||
# This is the chart version. This version number should be incremented each time you make changes
|
||||
# to the chart and its templates, including the app version.
|
||||
# Versions are expected to follow Semantic Versioning (https://semver.org/)
|
||||
version: 0.4.1
|
||||
version: 0.5.0
|
||||
|
||||
# This is the version number of the application being deployed. This version number should be
|
||||
# incremented each time you make changes to the application. Versions are not expected to
|
||||
|
|
|
@ -11,6 +11,7 @@ spec:
|
|||
{{- if eq .Values.deployment.kind "StatefulSet" }}
|
||||
replicas: {{ .Values.deployment.replicaCount }}
|
||||
serviceName: {{ include "garage.fullname" . }}
|
||||
podManagementPolicy: {{ .Values.deployment.podManagementPolicy }}
|
||||
{{- end }}
|
||||
template:
|
||||
metadata:
|
||||
|
|
|
@ -96,6 +96,8 @@ deployment:
|
|||
kind: StatefulSet
|
||||
# Number of StatefulSet replicas/garage nodes to start
|
||||
replicaCount: 3
|
||||
# If using statefulset, allow Parallel or OrderedReady (default)
|
||||
podManagementPolicy: OrderedReady
|
||||
|
||||
image:
|
||||
repository: dxflrs/amd64_garage
|
||||
|
|
14
script/jepsen.garage/Vagrantfile
vendored
14
script/jepsen.garage/Vagrantfile
vendored
|
@ -30,11 +30,11 @@ Vagrant.configure("2") do |config|
|
|||
config.vm.define "n6" do |config| vm(config, "n6", "192.168.56.26") end
|
||||
config.vm.define "n7" do |config| vm(config, "n7", "192.168.56.27") end
|
||||
|
||||
config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
|
||||
config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
|
||||
config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
|
||||
config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
|
||||
config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
|
||||
config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
|
||||
config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
|
||||
#config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
|
||||
#config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
|
||||
#config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
|
||||
#config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
|
||||
#config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
|
||||
#config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
|
||||
#config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
|
||||
end
|
||||
|
|
|
@ -3,11 +3,10 @@
|
|||
set -x
|
||||
|
||||
#for ppatch in task3c task3a tsfix2; do
|
||||
for ppatch in tsfix2; do
|
||||
for ppatch in v093 v1rc1; do
|
||||
#for psc in c cp cdp r pr cpr dpr; do
|
||||
for psc in cdp r pr cpr dpr; do
|
||||
#for ptsk in reg2 set1 set2; do
|
||||
for ptsk in set1; do
|
||||
for ptsk in reg2 set2; do
|
||||
for psc in c cp cdp r pr cpr dpr; do
|
||||
for irun in $(seq 10); do
|
||||
lein run test --nodes-file nodes.vagrant \
|
||||
--time-limit 60 --rate 100 --concurrency 100 --ops-per-key 100 \
|
||||
|
|
|
@ -38,7 +38,9 @@
|
|||
"tsfix2" "c82d91c6bccf307186332b6c5c6fc0b128b1b2b1"
|
||||
"task3a" "707442f5de416fdbed4681a33b739f0a787b7834"
|
||||
"task3b" "431b28e0cfdc9cac6c649193cf602108a8b02997"
|
||||
"task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"})
|
||||
"task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"
|
||||
"v093" "v0.9.3"
|
||||
"v1rc1" "v1.0.0-rc1"})
|
||||
|
||||
(def cli-opts
|
||||
"Additional command line options."
|
||||
|
|
|
@ -43,7 +43,7 @@
|
|||
"rpc_bind_addr = \"0.0.0.0:3901\"\n"
|
||||
"rpc_public_addr = \"" node ":3901\"\n"
|
||||
"db_engine = \"lmdb\"\n"
|
||||
"replication_mode = \"2\"\n"
|
||||
"replication_mode = \"3\"\n"
|
||||
"data_dir = \"" data-dir "\"\n"
|
||||
"metadata_dir = \"" meta-dir "\"\n"
|
||||
"[s3_api]\n"
|
||||
|
|
|
@ -11,6 +11,7 @@ in
|
|||
{
|
||||
# --- Dev shell inherited from flake.nix ---
|
||||
devShell = devShells.default;
|
||||
devShellFull = devShells.full;
|
||||
|
||||
# --- Continuous integration shell ---
|
||||
# The shell used for all CI jobs (along with devShell)
|
||||
|
|
|
@ -36,7 +36,7 @@ impl std::str::FromStr for Engine {
|
|||
match text {
|
||||
"lmdb" | "heed" => Ok(Self::Lmdb),
|
||||
"sqlite" | "sqlite3" | "rusqlite" => Ok(Self::Sqlite),
|
||||
"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.3).".into())),
|
||||
"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.4).".into())),
|
||||
kind => Err(Error(
|
||||
format!(
|
||||
"Invalid DB engine: {} (options are: lmdb, sqlite)",
|
||||
|
|
|
@ -48,7 +48,7 @@ pub enum Command {
|
|||
#[structopt(name = "worker", version = garage_version())]
|
||||
Worker(WorkerOperation),
|
||||
|
||||
/// Low-level debug operations on data blocks
|
||||
/// Low-level node-local debug operations on data blocks
|
||||
#[structopt(name = "block", version = garage_version())]
|
||||
Block(BlockOperation),
|
||||
|
||||
|
|
Loading…
Reference in a new issue