forked from Deuxfleurs/garage
Compare commits
22 commits
hotfix/1.0
...
main
Author | SHA1 | Date | |
---|---|---|---|
d38509ef4b | |||
|
39b37833c5 | ||
a2c1de646b | |||
15847a636a | |||
123d3e1f04 | |||
a6e4b96ca9 | |||
b442b0e35e | |||
33c2086d9e | |||
5ad1e55ccf | |||
1779fd40c0 | |||
ff093ddbb8 | |||
90e3c2af91 | |||
b47706809c | |||
126e0f47a3 | |||
738bb2f09c | |||
7dd7cb5759 | |||
8b663d8c5b | |||
c051db8204 | |||
50669b3e76 | |||
e5838b4837 | |||
87dfaf2eb9 | |||
554437254e |
13 changed files with 106 additions and 21 deletions
|
@ -300,7 +300,7 @@ Since `v0.8.0`, Garage can use alternative storage backends as follows:
|
||||||
| [Sled](https://sled.rs) (old default, removed since `v1.0`) | `"sled"` | `<metadata_dir>/db/` |
|
| [Sled](https://sled.rs) (old default, removed since `v1.0`) | `"sled"` | `<metadata_dir>/db/` |
|
||||||
|
|
||||||
Sled was supported until Garage v0.9.x, and was removed in Garage v1.0.
|
Sled was supported until Garage v0.9.x, and was removed in Garage v1.0.
|
||||||
You can still use an older binary of Garage (e.g. v0.9.3) to migrate
|
You can still use an older binary of Garage (e.g. v0.9.4) to migrate
|
||||||
old Sled metadata databases to another engine.
|
old Sled metadata databases to another engine.
|
||||||
|
|
||||||
Performance characteristics of the different DB engines are as follows:
|
Performance characteristics of the different DB engines are as follows:
|
||||||
|
@ -390,10 +390,12 @@ if geographical replication is used.
|
||||||
|
|
||||||
If this value is set, Garage will automatically take a snapshot of the metadata
|
If this value is set, Garage will automatically take a snapshot of the metadata
|
||||||
DB file at a regular interval and save it in the metadata directory.
|
DB file at a regular interval and save it in the metadata directory.
|
||||||
This can allow to recover from situations where the metadata DB file is corrupted,
|
This parameter can take any duration string that can be parsed by
|
||||||
for instance after an unclean shutdown.
|
the [`parse_duration`](https://docs.rs/parse_duration/latest/parse_duration/#syntax) crate.
|
||||||
See [this page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
|
||||||
|
|
||||||
|
Snapshots can allow to recover from situations where the metadata DB file is
|
||||||
|
corrupted, for instance after an unclean shutdown. See [this
|
||||||
|
page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
||||||
Garage keeps only the two most recent snapshots of the metadata DB and deletes
|
Garage keeps only the two most recent snapshots of the metadata DB and deletes
|
||||||
older ones automatically.
|
older ones automatically.
|
||||||
|
|
||||||
|
@ -412,7 +414,7 @@ month, with a random delay to avoid all nodes running at the same time. When
|
||||||
it scrubs the data directory, Garage will read all of the data files stored on
|
it scrubs the data directory, Garage will read all of the data files stored on
|
||||||
disk to check their integrity, and will rebuild any data files that it finds
|
disk to check their integrity, and will rebuild any data files that it finds
|
||||||
corrupted, using the remaining valid copies stored on other nodes.
|
corrupted, using the remaining valid copies stored on other nodes.
|
||||||
See [this page](@/documentation/operations/durability-repair.md#scrub) for details.
|
See [this page](@/documentation/operations/durability-repairs.md#scrub) for details.
|
||||||
|
|
||||||
Set the `disable_scrub` configuration value to `true` if you don't need Garage
|
Set the `disable_scrub` configuration value to `true` if you don't need Garage
|
||||||
to scrub the data directory, for instance if you are already scrubbing at the
|
to scrub the data directory, for instance if you are already scrubbing at the
|
||||||
|
|
77
doc/book/working-documents/migration-1.md
Normal file
77
doc/book/working-documents/migration-1.md
Normal file
|
@ -0,0 +1,77 @@
|
||||||
|
+++
|
||||||
|
title = "Migrating from 0.9 to 1.0"
|
||||||
|
weight = 11
|
||||||
|
+++
|
||||||
|
|
||||||
|
**This guide explains how to migrate to 1.0 if you have an existing 0.9 cluster.
|
||||||
|
We don't recommend trying to migrate to 1.0 directly from 0.8 or older.**
|
||||||
|
|
||||||
|
This migration procedure has been tested on several clusters without issues.
|
||||||
|
However, it is still a *critical procedure* that might cause issues.
|
||||||
|
**Make sure to back up all your data before attempting it!**
|
||||||
|
|
||||||
|
You might also want to read our [general documentation on upgrading Garage](@/documentation/operations/upgrading.md).
|
||||||
|
|
||||||
|
## Changes introduced in v1.0
|
||||||
|
|
||||||
|
The following are **breaking changes** in Garage v1.0 that require your attention when migrating:
|
||||||
|
|
||||||
|
- The Sled metadata db engine has been **removed**. If your cluster was still
|
||||||
|
using Sled, you will need to **use a Garage v0.9.x binary** to convert the
|
||||||
|
database using the `garage convert-db` subcommand. See
|
||||||
|
[here](@/documentation/reference-manual/configuration.md#db_engine) for the
|
||||||
|
details of the procedure.
|
||||||
|
|
||||||
|
The following syntax changes have been made to the configuration file:
|
||||||
|
|
||||||
|
- The `replication_mode` parameter has been split into two parameters:
|
||||||
|
[`replication_factor`](@/documentation/reference-manual/configuration.md#replication_factor)
|
||||||
|
and
|
||||||
|
[`consistency_mode`](@/documentation/reference-manual/configuration.md#consistency_mode).
|
||||||
|
The old syntax using `replication_mode` is still supported for legacy
|
||||||
|
reasons and can still be used.
|
||||||
|
|
||||||
|
- The parameters `sled_cache_capacity` and `sled_flush_every_ms` have been removed.
|
||||||
|
|
||||||
|
## Migration procedure
|
||||||
|
|
||||||
|
The migration to Garage v1.0 can be done with almost no downtime,
|
||||||
|
by restarting all nodes at once in the new version.
|
||||||
|
|
||||||
|
The migration steps are as follows:
|
||||||
|
|
||||||
|
1. Do a `garage repair --all-nodes --yes tables`, check the logs and check that
|
||||||
|
all data seems to be synced correctly between nodes. If you have time, do
|
||||||
|
additional `garage repair` procedures (`blocks`, `versions`, `block_refs`,
|
||||||
|
etc.)
|
||||||
|
|
||||||
|
2. Ensure you have a snapshot of your Garage installation that you can restore
|
||||||
|
to in case the upgrade goes wrong:
|
||||||
|
|
||||||
|
- If you are running Garage v0.9.4 or later, use the `garage meta snapshot
|
||||||
|
--all` to make a backup snapshot of the metadata directories of your nodes
|
||||||
|
for backup purposes, and save a copy of the following files in the
|
||||||
|
metadata directories of your nodes: `cluster_layout`, `data_layout`,
|
||||||
|
`node_key`, `node_key.pub`.
|
||||||
|
|
||||||
|
- If you are running a filesystem such as ZFS or BTRFS that support
|
||||||
|
snapshotting, you can create a filesystem-level snapshot to be used as a
|
||||||
|
restoration point if needed.
|
||||||
|
|
||||||
|
- In other cases, make a backup using the old procedure: turn off each node
|
||||||
|
individually; back up its metadata folder (for instance, use the following
|
||||||
|
command if your metadata directory is `/var/lib/garage/meta`: `cd
|
||||||
|
/var/lib/garage ; tar -acf meta-v0.9.tar.zst meta/`); turn it back on
|
||||||
|
again. This will allow you to take a backup of all nodes without
|
||||||
|
impacting global cluster availability. You can do all nodes of a single
|
||||||
|
zone at once as this does not impact the availability of Garage.
|
||||||
|
|
||||||
|
3. Prepare your updated binaries and configuration files for Garage v1.0
|
||||||
|
|
||||||
|
4. Shut down all v0.9 nodes simultaneously, and restart them all simultaneously
|
||||||
|
in v1.0. Use your favorite deployment tool (Ansible, Kubernetes, Nomad) to
|
||||||
|
achieve this as fast as possible. Garage v1.0 should be in a working state
|
||||||
|
as soon as enough nodes have started.
|
||||||
|
|
||||||
|
5. Monitor your cluster in the following hours to see if it works well under
|
||||||
|
your production load.
|
|
@ -76,6 +76,7 @@
|
||||||
# import the full shell using `nix develop .#full`
|
# import the full shell using `nix develop .#full`
|
||||||
full = shellWithPackages (with pkgs; [
|
full = shellWithPackages (with pkgs; [
|
||||||
rustfmt
|
rustfmt
|
||||||
|
rust-analyzer
|
||||||
clang
|
clang
|
||||||
mold
|
mold
|
||||||
# ---- extra packages for dev tasks ----
|
# ---- extra packages for dev tasks ----
|
||||||
|
|
|
@ -15,7 +15,7 @@ type: application
|
||||||
# This is the chart version. This version number should be incremented each time you make changes
|
# This is the chart version. This version number should be incremented each time you make changes
|
||||||
# to the chart and its templates, including the app version.
|
# to the chart and its templates, including the app version.
|
||||||
# Versions are expected to follow Semantic Versioning (https://semver.org/)
|
# Versions are expected to follow Semantic Versioning (https://semver.org/)
|
||||||
version: 0.4.1
|
version: 0.5.0
|
||||||
|
|
||||||
# This is the version number of the application being deployed. This version number should be
|
# This is the version number of the application being deployed. This version number should be
|
||||||
# incremented each time you make changes to the application. Versions are not expected to
|
# incremented each time you make changes to the application. Versions are not expected to
|
||||||
|
|
|
@ -11,6 +11,7 @@ spec:
|
||||||
{{- if eq .Values.deployment.kind "StatefulSet" }}
|
{{- if eq .Values.deployment.kind "StatefulSet" }}
|
||||||
replicas: {{ .Values.deployment.replicaCount }}
|
replicas: {{ .Values.deployment.replicaCount }}
|
||||||
serviceName: {{ include "garage.fullname" . }}
|
serviceName: {{ include "garage.fullname" . }}
|
||||||
|
podManagementPolicy: {{ .Values.deployment.podManagementPolicy }}
|
||||||
{{- end }}
|
{{- end }}
|
||||||
template:
|
template:
|
||||||
metadata:
|
metadata:
|
||||||
|
|
|
@ -96,6 +96,8 @@ deployment:
|
||||||
kind: StatefulSet
|
kind: StatefulSet
|
||||||
# Number of StatefulSet replicas/garage nodes to start
|
# Number of StatefulSet replicas/garage nodes to start
|
||||||
replicaCount: 3
|
replicaCount: 3
|
||||||
|
# If using statefulset, allow Parallel or OrderedReady (default)
|
||||||
|
podManagementPolicy: OrderedReady
|
||||||
|
|
||||||
image:
|
image:
|
||||||
repository: dxflrs/amd64_garage
|
repository: dxflrs/amd64_garage
|
||||||
|
|
14
script/jepsen.garage/Vagrantfile
vendored
14
script/jepsen.garage/Vagrantfile
vendored
|
@ -30,11 +30,11 @@ Vagrant.configure("2") do |config|
|
||||||
config.vm.define "n6" do |config| vm(config, "n6", "192.168.56.26") end
|
config.vm.define "n6" do |config| vm(config, "n6", "192.168.56.26") end
|
||||||
config.vm.define "n7" do |config| vm(config, "n7", "192.168.56.27") end
|
config.vm.define "n7" do |config| vm(config, "n7", "192.168.56.27") end
|
||||||
|
|
||||||
config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
|
#config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
|
||||||
config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
|
#config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
|
||||||
config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
|
#config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
|
||||||
config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
|
#config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
|
||||||
config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
|
#config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
|
||||||
config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
|
#config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
|
||||||
config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
|
#config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
|
||||||
end
|
end
|
||||||
|
|
|
@ -3,11 +3,10 @@
|
||||||
set -x
|
set -x
|
||||||
|
|
||||||
#for ppatch in task3c task3a tsfix2; do
|
#for ppatch in task3c task3a tsfix2; do
|
||||||
for ppatch in tsfix2; do
|
for ppatch in v093 v1rc1; do
|
||||||
#for psc in c cp cdp r pr cpr dpr; do
|
#for psc in c cp cdp r pr cpr dpr; do
|
||||||
for psc in cdp r pr cpr dpr; do
|
for ptsk in reg2 set2; do
|
||||||
#for ptsk in reg2 set1 set2; do
|
for psc in c cp cdp r pr cpr dpr; do
|
||||||
for ptsk in set1; do
|
|
||||||
for irun in $(seq 10); do
|
for irun in $(seq 10); do
|
||||||
lein run test --nodes-file nodes.vagrant \
|
lein run test --nodes-file nodes.vagrant \
|
||||||
--time-limit 60 --rate 100 --concurrency 100 --ops-per-key 100 \
|
--time-limit 60 --rate 100 --concurrency 100 --ops-per-key 100 \
|
||||||
|
|
|
@ -38,7 +38,9 @@
|
||||||
"tsfix2" "c82d91c6bccf307186332b6c5c6fc0b128b1b2b1"
|
"tsfix2" "c82d91c6bccf307186332b6c5c6fc0b128b1b2b1"
|
||||||
"task3a" "707442f5de416fdbed4681a33b739f0a787b7834"
|
"task3a" "707442f5de416fdbed4681a33b739f0a787b7834"
|
||||||
"task3b" "431b28e0cfdc9cac6c649193cf602108a8b02997"
|
"task3b" "431b28e0cfdc9cac6c649193cf602108a8b02997"
|
||||||
"task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"})
|
"task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"
|
||||||
|
"v093" "v0.9.3"
|
||||||
|
"v1rc1" "v1.0.0-rc1"})
|
||||||
|
|
||||||
(def cli-opts
|
(def cli-opts
|
||||||
"Additional command line options."
|
"Additional command line options."
|
||||||
|
|
|
@ -43,7 +43,7 @@
|
||||||
"rpc_bind_addr = \"0.0.0.0:3901\"\n"
|
"rpc_bind_addr = \"0.0.0.0:3901\"\n"
|
||||||
"rpc_public_addr = \"" node ":3901\"\n"
|
"rpc_public_addr = \"" node ":3901\"\n"
|
||||||
"db_engine = \"lmdb\"\n"
|
"db_engine = \"lmdb\"\n"
|
||||||
"replication_mode = \"2\"\n"
|
"replication_mode = \"3\"\n"
|
||||||
"data_dir = \"" data-dir "\"\n"
|
"data_dir = \"" data-dir "\"\n"
|
||||||
"metadata_dir = \"" meta-dir "\"\n"
|
"metadata_dir = \"" meta-dir "\"\n"
|
||||||
"[s3_api]\n"
|
"[s3_api]\n"
|
||||||
|
|
|
@ -11,6 +11,7 @@ in
|
||||||
{
|
{
|
||||||
# --- Dev shell inherited from flake.nix ---
|
# --- Dev shell inherited from flake.nix ---
|
||||||
devShell = devShells.default;
|
devShell = devShells.default;
|
||||||
|
devShellFull = devShells.full;
|
||||||
|
|
||||||
# --- Continuous integration shell ---
|
# --- Continuous integration shell ---
|
||||||
# The shell used for all CI jobs (along with devShell)
|
# The shell used for all CI jobs (along with devShell)
|
||||||
|
|
|
@ -36,7 +36,7 @@ impl std::str::FromStr for Engine {
|
||||||
match text {
|
match text {
|
||||||
"lmdb" | "heed" => Ok(Self::Lmdb),
|
"lmdb" | "heed" => Ok(Self::Lmdb),
|
||||||
"sqlite" | "sqlite3" | "rusqlite" => Ok(Self::Sqlite),
|
"sqlite" | "sqlite3" | "rusqlite" => Ok(Self::Sqlite),
|
||||||
"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.3).".into())),
|
"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.4).".into())),
|
||||||
kind => Err(Error(
|
kind => Err(Error(
|
||||||
format!(
|
format!(
|
||||||
"Invalid DB engine: {} (options are: lmdb, sqlite)",
|
"Invalid DB engine: {} (options are: lmdb, sqlite)",
|
||||||
|
|
|
@ -48,7 +48,7 @@ pub enum Command {
|
||||||
#[structopt(name = "worker", version = garage_version())]
|
#[structopt(name = "worker", version = garage_version())]
|
||||||
Worker(WorkerOperation),
|
Worker(WorkerOperation),
|
||||||
|
|
||||||
/// Low-level debug operations on data blocks
|
/// Low-level node-local debug operations on data blocks
|
||||||
#[structopt(name = "block", version = garage_version())]
|
#[structopt(name = "block", version = garage_version())]
|
||||||
Block(BlockOperation),
|
Block(BlockOperation),
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue