forked from Deuxfleurs/garage
Compare commits
1 commit
main
...
hotfix/1.0
Author | SHA1 | Date | |
---|---|---|---|
6558c15863 |
14 changed files with 24 additions and 112 deletions
|
@ -300,7 +300,7 @@ Since `v0.8.0`, Garage can use alternative storage backends as follows:
|
|||
| [Sled](https://sled.rs) (old default, removed since `v1.0`) | `"sled"` | `<metadata_dir>/db/` |
|
||||
|
||||
Sled was supported until Garage v0.9.x, and was removed in Garage v1.0.
|
||||
You can still use an older binary of Garage (e.g. v0.9.4) to migrate
|
||||
You can still use an older binary of Garage (e.g. v0.9.3) to migrate
|
||||
old Sled metadata databases to another engine.
|
||||
|
||||
Performance characteristics of the different DB engines are as follows:
|
||||
|
@ -390,12 +390,10 @@ if geographical replication is used.
|
|||
|
||||
If this value is set, Garage will automatically take a snapshot of the metadata
|
||||
DB file at a regular interval and save it in the metadata directory.
|
||||
This parameter can take any duration string that can be parsed by
|
||||
the [`parse_duration`](https://docs.rs/parse_duration/latest/parse_duration/#syntax) crate.
|
||||
This can allow to recover from situations where the metadata DB file is corrupted,
|
||||
for instance after an unclean shutdown.
|
||||
See [this page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
||||
|
||||
Snapshots can allow to recover from situations where the metadata DB file is
|
||||
corrupted, for instance after an unclean shutdown. See [this
|
||||
page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
||||
Garage keeps only the two most recent snapshots of the metadata DB and deletes
|
||||
older ones automatically.
|
||||
|
||||
|
@ -414,7 +412,7 @@ month, with a random delay to avoid all nodes running at the same time. When
|
|||
it scrubs the data directory, Garage will read all of the data files stored on
|
||||
disk to check their integrity, and will rebuild any data files that it finds
|
||||
corrupted, using the remaining valid copies stored on other nodes.
|
||||
See [this page](@/documentation/operations/durability-repairs.md#scrub) for details.
|
||||
See [this page](@/documentation/operations/durability-repair.md#scrub) for details.
|
||||
|
||||
Set the `disable_scrub` configuration value to `true` if you don't need Garage
|
||||
to scrub the data directory, for instance if you are already scrubbing at the
|
||||
|
|
|
@ -1,77 +0,0 @@
|
|||
+++
|
||||
title = "Migrating from 0.9 to 1.0"
|
||||
weight = 11
|
||||
+++
|
||||
|
||||
**This guide explains how to migrate to 1.0 if you have an existing 0.9 cluster.
|
||||
We don't recommend trying to migrate to 1.0 directly from 0.8 or older.**
|
||||
|
||||
This migration procedure has been tested on several clusters without issues.
|
||||
However, it is still a *critical procedure* that might cause issues.
|
||||
**Make sure to back up all your data before attempting it!**
|
||||
|
||||
You might also want to read our [general documentation on upgrading Garage](@/documentation/operations/upgrading.md).
|
||||
|
||||
## Changes introduced in v1.0
|
||||
|
||||
The following are **breaking changes** in Garage v1.0 that require your attention when migrating:
|
||||
|
||||
- The Sled metadata db engine has been **removed**. If your cluster was still
|
||||
using Sled, you will need to **use a Garage v0.9.x binary** to convert the
|
||||
database using the `garage convert-db` subcommand. See
|
||||
[here](@/documentation/reference-manual/configuration.md#db_engine) for the
|
||||
details of the procedure.
|
||||
|
||||
The following syntax changes have been made to the configuration file:
|
||||
|
||||
- The `replication_mode` parameter has been split into two parameters:
|
||||
[`replication_factor`](@/documentation/reference-manual/configuration.md#replication_factor)
|
||||
and
|
||||
[`consistency_mode`](@/documentation/reference-manual/configuration.md#consistency_mode).
|
||||
The old syntax using `replication_mode` is still supported for legacy
|
||||
reasons and can still be used.
|
||||
|
||||
- The parameters `sled_cache_capacity` and `sled_flush_every_ms` have been removed.
|
||||
|
||||
## Migration procedure
|
||||
|
||||
The migration to Garage v1.0 can be done with almost no downtime,
|
||||
by restarting all nodes at once in the new version.
|
||||
|
||||
The migration steps are as follows:
|
||||
|
||||
1. Do a `garage repair --all-nodes --yes tables`, check the logs and check that
|
||||
all data seems to be synced correctly between nodes. If you have time, do
|
||||
additional `garage repair` procedures (`blocks`, `versions`, `block_refs`,
|
||||
etc.)
|
||||
|
||||
2. Ensure you have a snapshot of your Garage installation that you can restore
|
||||
to in case the upgrade goes wrong:
|
||||
|
||||
- If you are running Garage v0.9.4 or later, use the `garage meta snapshot
|
||||
--all` to make a backup snapshot of the metadata directories of your nodes
|
||||
for backup purposes, and save a copy of the following files in the
|
||||
metadata directories of your nodes: `cluster_layout`, `data_layout`,
|
||||
`node_key`, `node_key.pub`.
|
||||
|
||||
- If you are running a filesystem such as ZFS or BTRFS that support
|
||||
snapshotting, you can create a filesystem-level snapshot to be used as a
|
||||
restoration point if needed.
|
||||
|
||||
- In other cases, make a backup using the old procedure: turn off each node
|
||||
individually; back up its metadata folder (for instance, use the following
|
||||
command if your metadata directory is `/var/lib/garage/meta`: `cd
|
||||
/var/lib/garage ; tar -acf meta-v0.9.tar.zst meta/`); turn it back on
|
||||
again. This will allow you to take a backup of all nodes without
|
||||
impacting global cluster availability. You can do all nodes of a single
|
||||
zone at once as this does not impact the availability of Garage.
|
||||
|
||||
3. Prepare your updated binaries and configuration files for Garage v1.0
|
||||
|
||||
4. Shut down all v0.9 nodes simultaneously, and restart them all simultaneously
|
||||
in v1.0. Use your favorite deployment tool (Ansible, Kubernetes, Nomad) to
|
||||
achieve this as fast as possible. Garage v1.0 should be in a working state
|
||||
as soon as enough nodes have started.
|
||||
|
||||
5. Monitor your cluster in the following hours to see if it works well under
|
||||
your production load.
|
|
@ -76,7 +76,6 @@
|
|||
# import the full shell using `nix develop .#full`
|
||||
full = shellWithPackages (with pkgs; [
|
||||
rustfmt
|
||||
rust-analyzer
|
||||
clang
|
||||
mold
|
||||
# ---- extra packages for dev tasks ----
|
||||
|
|
|
@ -15,7 +15,7 @@ type: application
|
|||
# This is the chart version. This version number should be incremented each time you make changes
|
||||
# to the chart and its templates, including the app version.
|
||||
# Versions are expected to follow Semantic Versioning (https://semver.org/)
|
||||
version: 0.5.0
|
||||
version: 0.4.1
|
||||
|
||||
# This is the version number of the application being deployed. This version number should be
|
||||
# incremented each time you make changes to the application. Versions are not expected to
|
||||
|
|
|
@ -11,7 +11,6 @@ spec:
|
|||
{{- if eq .Values.deployment.kind "StatefulSet" }}
|
||||
replicas: {{ .Values.deployment.replicaCount }}
|
||||
serviceName: {{ include "garage.fullname" . }}
|
||||
podManagementPolicy: {{ .Values.deployment.podManagementPolicy }}
|
||||
{{- end }}
|
||||
template:
|
||||
metadata:
|
||||
|
|
|
@ -96,8 +96,6 @@ deployment:
|
|||
kind: StatefulSet
|
||||
# Number of StatefulSet replicas/garage nodes to start
|
||||
replicaCount: 3
|
||||
# If using statefulset, allow Parallel or OrderedReady (default)
|
||||
podManagementPolicy: OrderedReady
|
||||
|
||||
image:
|
||||
repository: dxflrs/amd64_garage
|
||||
|
|
14
script/jepsen.garage/Vagrantfile
vendored
14
script/jepsen.garage/Vagrantfile
vendored
|
@ -30,11 +30,11 @@ Vagrant.configure("2") do |config|
|
|||
config.vm.define "n6" do |config| vm(config, "n6", "192.168.56.26") end
|
||||
config.vm.define "n7" do |config| vm(config, "n7", "192.168.56.27") end
|
||||
|
||||
#config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
|
||||
#config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
|
||||
#config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
|
||||
#config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
|
||||
#config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
|
||||
#config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
|
||||
#config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
|
||||
config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
|
||||
config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
|
||||
config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
|
||||
config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
|
||||
config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
|
||||
config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
|
||||
config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
|
||||
end
|
||||
|
|
|
@ -3,10 +3,11 @@
|
|||
set -x
|
||||
|
||||
#for ppatch in task3c task3a tsfix2; do
|
||||
for ppatch in v093 v1rc1; do
|
||||
for ppatch in tsfix2; do
|
||||
#for psc in c cp cdp r pr cpr dpr; do
|
||||
for ptsk in reg2 set2; do
|
||||
for psc in c cp cdp r pr cpr dpr; do
|
||||
for psc in cdp r pr cpr dpr; do
|
||||
#for ptsk in reg2 set1 set2; do
|
||||
for ptsk in set1; do
|
||||
for irun in $(seq 10); do
|
||||
lein run test --nodes-file nodes.vagrant \
|
||||
--time-limit 60 --rate 100 --concurrency 100 --ops-per-key 100 \
|
||||
|
|
|
@ -38,9 +38,7 @@
|
|||
"tsfix2" "c82d91c6bccf307186332b6c5c6fc0b128b1b2b1"
|
||||
"task3a" "707442f5de416fdbed4681a33b739f0a787b7834"
|
||||
"task3b" "431b28e0cfdc9cac6c649193cf602108a8b02997"
|
||||
"task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"
|
||||
"v093" "v0.9.3"
|
||||
"v1rc1" "v1.0.0-rc1"})
|
||||
"task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"})
|
||||
|
||||
(def cli-opts
|
||||
"Additional command line options."
|
||||
|
|
|
@ -43,7 +43,7 @@
|
|||
"rpc_bind_addr = \"0.0.0.0:3901\"\n"
|
||||
"rpc_public_addr = \"" node ":3901\"\n"
|
||||
"db_engine = \"lmdb\"\n"
|
||||
"replication_mode = \"3\"\n"
|
||||
"replication_mode = \"2\"\n"
|
||||
"data_dir = \"" data-dir "\"\n"
|
||||
"metadata_dir = \"" meta-dir "\"\n"
|
||||
"[s3_api]\n"
|
||||
|
|
|
@ -11,7 +11,6 @@ in
|
|||
{
|
||||
# --- Dev shell inherited from flake.nix ---
|
||||
devShell = devShells.default;
|
||||
devShellFull = devShells.full;
|
||||
|
||||
# --- Continuous integration shell ---
|
||||
# The shell used for all CI jobs (along with devShell)
|
||||
|
|
|
@ -36,7 +36,7 @@ impl std::str::FromStr for Engine {
|
|||
match text {
|
||||
"lmdb" | "heed" => Ok(Self::Lmdb),
|
||||
"sqlite" | "sqlite3" | "rusqlite" => Ok(Self::Sqlite),
|
||||
"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.4).".into())),
|
||||
"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.3).".into())),
|
||||
kind => Err(Error(
|
||||
format!(
|
||||
"Invalid DB engine: {} (options are: lmdb, sqlite)",
|
||||
|
|
|
@ -48,7 +48,7 @@ pub enum Command {
|
|||
#[structopt(name = "worker", version = garage_version())]
|
||||
Worker(WorkerOperation),
|
||||
|
||||
/// Low-level node-local debug operations on data blocks
|
||||
/// Low-level debug operations on data blocks
|
||||
#[structopt(name = "block", version = garage_version())]
|
||||
Block(BlockOperation),
|
||||
|
||||
|
|
|
@ -43,13 +43,10 @@ impl TableReplication for TableFullReplication {
|
|||
}
|
||||
fn write_quorum(&self) -> usize {
|
||||
let nmembers = self.system.cluster_layout().current().all_nodes().len();
|
||||
|
||||
let max_faults = if nmembers > 1 { 1 } else { 0 };
|
||||
|
||||
if nmembers > max_faults {
|
||||
nmembers - max_faults
|
||||
} else {
|
||||
if nmembers < 3 {
|
||||
1
|
||||
} else {
|
||||
nmembers.div_euclid(2) + 1
|
||||
}
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in a new issue