decrease write quorum

2024-05-15 08:05:18 +02:00
14 changed files with 24 additions and 112 deletions
--- a/doc/book/reference-manual/configuration.md
+++ b/doc/book/reference-manual/configuration.md
@ -300,7 +300,7 @@ Since `v0.8.0`, Garage can use alternative storage backends as follows:
 | [Sled](https://sled.rs) (old default, removed since `v1.0`) | `"sled"` | `<metadata_dir>/db/` |

 Sled was supported until Garage v0.9.x, and was removed in Garage v1.0.
-You can still use an older binary of Garage (e.g. v0.9.4) to migrate
+You can still use an older binary of Garage (e.g. v0.9.3) to migrate
 old Sled metadata databases to another engine.

 Performance characteristics of the different DB engines are as follows:
@ -390,12 +390,10 @@ if geographical replication is used.

 If this value is set, Garage will automatically take a snapshot of the metadata
 DB file at a regular interval and save it in the metadata directory.
-This parameter can take any duration string that can be parsed by
-the [`parse_duration`](https://docs.rs/parse_duration/latest/parse_duration/#syntax) crate.
+This can allow to recover from situations where the metadata DB file is corrupted,
+for instance after an unclean shutdown.
+See [this page](@/documentation/operations/recovering.md#corrupted_meta) for details.

-Snapshots can allow to recover from situations where the metadata DB file is
-corrupted, for instance after an unclean shutdown.  See [this
-page](@/documentation/operations/recovering.md#corrupted_meta) for details.
 Garage keeps only the two most recent snapshots of the metadata DB and deletes
 older ones automatically.

@ -414,7 +412,7 @@ month, with a random delay to avoid all nodes running at the same time.  When
 it scrubs the data directory, Garage will read all of the data files stored on
 disk to check their integrity, and will rebuild any data files that it finds
 corrupted, using the remaining valid copies stored on other nodes.
-See [this page](@/documentation/operations/durability-repairs.md#scrub) for details.
+See [this page](@/documentation/operations/durability-repair.md#scrub) for details.

 Set the `disable_scrub` configuration value to `true` if you don't need Garage
 to scrub the data directory, for instance if you are already scrubbing at the
--- a/doc/book/working-documents/migration-1.md
+++ b/doc/book/working-documents/migration-1.md
@ -1,77 +0,0 @@
-+++
-title = "Migrating from 0.9 to 1.0"
-weight = 11
-+++
-
-**This guide explains how to migrate to 1.0 if you have an existing 0.9 cluster.
-We don't recommend trying to migrate to 1.0 directly from 0.8 or older.**
-
-This migration procedure has been tested on several clusters without issues.
-However, it is still a *critical procedure* that might cause issues.
-**Make sure to back up all your data before attempting it!**
-
-You might also want to read our [general documentation on upgrading Garage](@/documentation/operations/upgrading.md).
-
-## Changes introduced in v1.0
-
-The following are **breaking changes** in Garage v1.0 that require your attention when migrating:
-
- The Sled metadata db engine has been **removed**. If your cluster was still
-  using Sled, you will need to **use a Garage v0.9.x binary** to convert the
-  database using the `garage convert-db` subcommand. See
-  [here](@/documentation/reference-manual/configuration.md#db_engine) for the
-  details of the procedure.
-
-The following syntax changes have been made to the configuration file:
-
- The `replication_mode` parameter has been split into two parameters:
-  [`replication_factor`](@/documentation/reference-manual/configuration.md#replication_factor)
-  and
-  [`consistency_mode`](@/documentation/reference-manual/configuration.md#consistency_mode).
-  The old syntax using `replication_mode` is still supported for legacy
-  reasons and can still be used.
-
- The parameters `sled_cache_capacity` and `sled_flush_every_ms` have been removed.
-
-## Migration procedure
-
-The migration to Garage v1.0 can be done with almost no downtime,
-by restarting all nodes at once in the new version.
-
-The migration steps are as follows:
-
-1. Do a `garage repair --all-nodes --yes tables`, check the logs and check that
-   all data seems to be synced correctly between nodes. If you have time, do
-   additional `garage repair` procedures (`blocks`, `versions`, `block_refs`,
-   etc.)
-
-2. Ensure you have a snapshot of your Garage installation that you can restore
-   to in case the upgrade goes wrong:
-
-   - If you are running Garage v0.9.4 or later, use the `garage meta snapshot
-     --all` to make a backup snapshot of the metadata directories of your nodes
-     for backup purposes, and save a copy of the following files in the
-     metadata directories of your nodes: `cluster_layout`, `data_layout`,
-     `node_key`, `node_key.pub`.
-
-   - If you are running a filesystem such as ZFS or BTRFS that support
-     snapshotting, you can create a filesystem-level snapshot to be used as a
-     restoration point if needed.
-
-   - In other cases, make a backup using the old procedure: turn off each node
-     individually; back up its metadata folder (for instance, use the following
-     command if your metadata directory is `/var/lib/garage/meta`: `cd
-     /var/lib/garage ; tar -acf meta-v0.9.tar.zst meta/`); turn it back on
-     again.  This will allow you to take a backup of all nodes without
-     impacting global cluster availability.  You can do all nodes of a single
-     zone at once as this does not impact the availability of Garage.
-
-3. Prepare your updated binaries and configuration files for Garage v1.0
-
-4. Shut down all v0.9 nodes simultaneously, and restart them all simultaneously
-   in v1.0.  Use your favorite deployment tool (Ansible, Kubernetes, Nomad) to
-   achieve this as fast as possible.  Garage v1.0 should be in a working state
-   as soon as enough nodes have started.
-
-5. Monitor your cluster in the following hours to see if it works well under
-   your production load.
--- a/flake.nix
+++ b/flake.nix
@ -76,7 +76,6 @@
            # import the full shell using `nix develop .#full`
            full = shellWithPackages (with pkgs; [
              rustfmt
-              rust-analyzer
              clang
              mold
              # ---- extra packages for dev tasks ----
--- a/script/helm/garage/Chart.yaml
+++ b/script/helm/garage/Chart.yaml
@ -15,7 +15,7 @@ type: application
 # This is the chart version. This version number should be incremented each time you make changes
 # to the chart and its templates, including the app version.
 # Versions are expected to follow Semantic Versioning (https://semver.org/)
-version: 0.5.0
+version: 0.4.1

 # This is the version number of the application being deployed. This version number should be
 # incremented each time you make changes to the application. Versions are not expected to
--- a/script/helm/garage/templates/workload.yaml
+++ b/script/helm/garage/templates/workload.yaml
@ -11,7 +11,6 @@ spec:
  {{- if eq .Values.deployment.kind "StatefulSet" }}
  replicas: {{ .Values.deployment.replicaCount }}
  serviceName: {{ include "garage.fullname" . }}
-  podManagementPolicy: {{ .Values.deployment.podManagementPolicy }}
  {{- end }}
  template:
    metadata:
--- a/script/helm/garage/values.yaml
+++ b/script/helm/garage/values.yaml
@ -96,8 +96,6 @@ deployment:
  kind: StatefulSet
  # Number of StatefulSet replicas/garage nodes to start
  replicaCount: 3
-  # If using statefulset, allow Parallel or OrderedReady (default)
-  podManagementPolicy: OrderedReady

 image:
  repository: dxflrs/amd64_garage
--- a/script/jepsen.garage/Vagrantfile
+++ b/script/jepsen.garage/Vagrantfile
@ -30,11 +30,11 @@ Vagrant.configure("2") do |config|
  config.vm.define "n6" do |config| vm(config, "n6", "192.168.56.26") end
  config.vm.define "n7" do |config| vm(config, "n7", "192.168.56.27") end

-  #config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
-  #config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
-  #config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
-  #config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
-  #config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
-  #config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
-  #config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
+  config.vm.define "n8" do |config| vm(config, "n8", "192.168.56.28") end
+  config.vm.define "n9" do |config| vm(config, "n9", "192.168.56.29") end
+  config.vm.define "n10" do |config| vm(config, "n10", "192.168.56.30") end
+  config.vm.define "n11" do |config| vm(config, "n11", "192.168.56.31") end
+  config.vm.define "n12" do |config| vm(config, "n12", "192.168.56.32") end
+  config.vm.define "n13" do |config| vm(config, "n13", "192.168.56.33") end
+  config.vm.define "n14" do |config| vm(config, "n14", "192.168.56.34") end
 end
--- a/script/jepsen.garage/all_tests_1.sh
+++ b/script/jepsen.garage/all_tests_1.sh
@ -3,10 +3,11 @@
 set -x

 #for ppatch in task3c task3a tsfix2; do
-for ppatch in v093 v1rc1; do
+for ppatch in tsfix2; do
 	#for psc in c cp cdp r pr cpr dpr; do
-	for ptsk in reg2 set2; do
-		for psc in c cp cdp r pr cpr dpr; do
+	for psc in cdp r pr cpr dpr; do
+		#for ptsk in reg2 set1 set2; do
+		for ptsk in set1; do
 			for irun in $(seq 10); do
 				lein run test --nodes-file nodes.vagrant \
 					--time-limit 60 --rate 100  --concurrency 100 --ops-per-key 100 \
--- a/script/jepsen.garage/src/jepsen/garage.clj
+++ b/script/jepsen.garage/src/jepsen/garage.clj
@ -38,9 +38,7 @@
   "tsfix2" "c82d91c6bccf307186332b6c5c6fc0b128b1b2b1"
   "task3a" "707442f5de416fdbed4681a33b739f0a787b7834"
   "task3b" "431b28e0cfdc9cac6c649193cf602108a8b02997"
-   "task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"
-   "v093" "v0.9.3"
-   "v1rc1" "v1.0.0-rc1"})
+   "task3c" "0041b013a473e3ae72f50209d8f79db75a72848b"})

 (def cli-opts
  "Additional command line options."
--- a/script/jepsen.garage/src/jepsen/garage/daemon.clj
+++ b/script/jepsen.garage/src/jepsen/garage/daemon.clj
@ -43,7 +43,7 @@
             "rpc_bind_addr = \"0.0.0.0:3901\"\n"
             "rpc_public_addr = \"" node ":3901\"\n"
             "db_engine = \"lmdb\"\n"
-             "replication_mode = \"3\"\n"
+             "replication_mode = \"2\"\n"
             "data_dir = \"" data-dir "\"\n"
             "metadata_dir = \"" meta-dir "\"\n"
             "[s3_api]\n"
--- a/shell.nix
+++ b/shell.nix
@ -11,7 +11,6 @@ in
 {
  # --- Dev shell inherited from flake.nix ---
  devShell = devShells.default;
-  devShellFull = devShells.full;

  # --- Continuous integration shell ---
  # The shell used for all CI jobs (along with devShell)
--- a/src/db/open.rs
+++ b/src/db/open.rs
@ -36,7 +36,7 @@ impl std::str::FromStr for Engine {
 		match text {
 			"lmdb" | "heed" => Ok(Self::Lmdb),
 			"sqlite" | "sqlite3" | "rusqlite" => Ok(Self::Sqlite),
-			"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.4).".into())),
+			"sled" => Err(Error("Sled is no longer supported as a database engine. Converting your old metadata db can be done using an older Garage binary (e.g. v0.9.3).".into())),
 			kind => Err(Error(
 				format!(
 					"Invalid DB engine: {} (options are: lmdb, sqlite)",
--- a/src/garage/cli/structs.rs
+++ b/src/garage/cli/structs.rs
@ -48,7 +48,7 @@ pub enum Command {
 	#[structopt(name = "worker", version = garage_version())]
 	Worker(WorkerOperation),

-	/// Low-level node-local debug operations on data blocks
+	/// Low-level debug operations on data blocks
 	#[structopt(name = "block", version = garage_version())]
 	Block(BlockOperation),

--- a/src/table/replication/fullcopy.rs
+++ b/src/table/replication/fullcopy.rs
@ -43,13 +43,10 @@ impl TableReplication for TableFullReplication {
 	}
 	fn write_quorum(&self) -> usize {
 		let nmembers = self.system.cluster_layout().current().all_nodes().len();
-
-		let max_faults = if nmembers > 1 { 1 } else { 0 };
-
-		if nmembers > max_faults {
-			nmembers - max_faults
-		} else {
+		if nmembers < 3 {
 			1
+		} else {
+			nmembers.div_euclid(2) + 1
 		}
 	}