Add documentation on durability and repair procedures (fix #219)
All checks were successful
continuous-integration/drone/push Build is passing
All checks were successful
continuous-integration/drone/push Build is passing
This commit is contained in:
parent
3aadba724d
commit
9233661967
3 changed files with 116 additions and 2 deletions
114
doc/book/cookbook/durability-repairs.md
Normal file
114
doc/book/cookbook/durability-repairs.md
Normal file
|
@ -0,0 +1,114 @@
|
||||||
|
+++
|
||||||
|
title = "Durability & Repairs"
|
||||||
|
weight = 50
|
||||||
|
+++
|
||||||
|
|
||||||
|
To ensure the best durability of your data and to fix any inconsistencies that may
|
||||||
|
pop up in a distributed system, Garage provides a serires of repair operations.
|
||||||
|
This guide will explain the meaning of each of them and when they should be applied.
|
||||||
|
|
||||||
|
|
||||||
|
# General syntax of repair operations
|
||||||
|
|
||||||
|
Repair operations described below are of the form `garage repair <repair_name>`.
|
||||||
|
These repairs will not launch without the `--yes` flag, which should
|
||||||
|
be added as follows: `garage repair --yes <repair_name>`.
|
||||||
|
By default these repair procedures will only run on the Garage node your CLI is
|
||||||
|
connecting to. To run on all nodes, add the `-a` flag as follows:
|
||||||
|
`garage repair -a --yes <repair_name>`.
|
||||||
|
|
||||||
|
# Data block operations
|
||||||
|
|
||||||
|
## Data store scrub
|
||||||
|
|
||||||
|
Scrubbing the data store means examining each individual data block to check that
|
||||||
|
their content is correct, by verifying their hash. Any block found to be corrupted
|
||||||
|
(e.g. by bitrot or by an accidental manipulation of the datastore) will be
|
||||||
|
restored from another node that holds a valid copy.
|
||||||
|
|
||||||
|
A scrub is run automatically by Garage every 30 days. It can also be launched
|
||||||
|
manually using `garage repair scrub start`.
|
||||||
|
|
||||||
|
To view the status of an ongoing scrub, first find the task ID of the scrub worker
|
||||||
|
using `garage worker list`. Then, run `garage worker info <scrub_task_id>` to
|
||||||
|
view detailed runtime statistics of the scrub. To gather cluster-wide information,
|
||||||
|
this command has to be run on each individual node.
|
||||||
|
|
||||||
|
A scrub is a very disk-intensive operation that might slow down your cluster.
|
||||||
|
You may pause an ongoing scrub using `garage repair scrub pause`, but note that
|
||||||
|
the scrub will resume automatically 24 hours later as Garage will not let your
|
||||||
|
cluster run without a regular scrub. If the scrub procedure is too intensive
|
||||||
|
for your servers and is slowing down your workload, the recommended solution
|
||||||
|
is to increase the "scrub tranquility" using `garage repair scrub set-tranquility`.
|
||||||
|
A higher tranquility value will make Garage take longer pauses between two block
|
||||||
|
verifications. Of course, scrubbing the entire data store will also take longer.
|
||||||
|
|
||||||
|
## Block check and resync
|
||||||
|
|
||||||
|
In some cases, nodes hold a reference to a block but do not actually have the block
|
||||||
|
stored on disk. Conversely, they may also have on disk blocks that are not referenced
|
||||||
|
any more. To fix both cases, a block repair may be run with `garage repair blocks`.
|
||||||
|
This will scan the entire block reference counter table to check that the blocks
|
||||||
|
exist on disk, and will scan the entire disk store to check that stored blocks
|
||||||
|
are referenced.
|
||||||
|
|
||||||
|
It is recommended to run this procedure when changing your cluster layout,
|
||||||
|
after the metadata tables have finished synchronizing between nodes
|
||||||
|
(usually a few hours after `garage layout apply`).
|
||||||
|
|
||||||
|
## Inspecting lost blocks
|
||||||
|
|
||||||
|
In extremely rare situations, data blocks may be unavailable from the entire cluster.
|
||||||
|
This means that even using `garage repair blocks`, some nodes may be unable
|
||||||
|
to fetch data blocks for which they hold a reference.
|
||||||
|
|
||||||
|
These errors are stored on each node in a list of "block resync errors", i.e.
|
||||||
|
blocks for which the last resync operation failed.
|
||||||
|
This list can be inspected using `garage block list-errors`.
|
||||||
|
These errors usually fall into one of the following categories:
|
||||||
|
|
||||||
|
1. a block is still referenced but the object was deleted, this is a case
|
||||||
|
of metadata reference inconsistency (see below for the fix)
|
||||||
|
2. a block is referenced by a non-deleted object, but could not be fetched due
|
||||||
|
to a transient error such as a network failure
|
||||||
|
3. a block is referenced by a non-deleted object, but could not be fetched due
|
||||||
|
to a permanent error such as there not being any valid copy of the block on the
|
||||||
|
entire cluster
|
||||||
|
|
||||||
|
To help make the difference between cases 1 and cases 2 and 3, you may use the
|
||||||
|
`garage block info` command to see which objects hold a reference to each block.
|
||||||
|
|
||||||
|
In the second case (transient errors), Garage will try to fetch the block again
|
||||||
|
after a certain time, so the error should disappear natuarlly. You can also
|
||||||
|
request Garage to try to fetch the block immediately using `garage block retry-now`
|
||||||
|
if you have fixed the transient issue.
|
||||||
|
|
||||||
|
If you are confident that you are in the third scenario and that your data block
|
||||||
|
is definitely lost, then there is no other choice than to declare your S3 objects
|
||||||
|
as unrecoverable, and to delete them properly from the data store. This can be done
|
||||||
|
using the `garage block purge` command.
|
||||||
|
|
||||||
|
|
||||||
|
# Metadata operations
|
||||||
|
|
||||||
|
## Metadata table resync
|
||||||
|
|
||||||
|
Garage automatically resyncs all entries stored in the metadata tables every hour,
|
||||||
|
to ensure that all nodes have the most up-to-date version of all the information
|
||||||
|
they should be holding.
|
||||||
|
The resync procedure is based on a Merkle tree that allows to efficiently find
|
||||||
|
differences between nodes.
|
||||||
|
|
||||||
|
In some special cases, e.g. before an upgrade, you might want to run a table
|
||||||
|
resync manually. This can be done using `garage repair tables`.
|
||||||
|
|
||||||
|
## Metadata table reference fixes
|
||||||
|
|
||||||
|
In some very rare cases where nodes are unavailable, some references between objects
|
||||||
|
are broken. For instance, if an object is deleted, the underlying versions or data
|
||||||
|
blocks may still be held by Garage. If you suspect that such corruption has occurred
|
||||||
|
in your cluster, you can run one of the following repair procedures:
|
||||||
|
|
||||||
|
- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version
|
||||||
|
- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected)
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
+++
|
+++
|
||||||
title = "Recovering from failures"
|
title = "Recovering from failures"
|
||||||
weight = 50
|
weight = 60
|
||||||
+++
|
+++
|
||||||
|
|
||||||
Garage is meant to work on old, second-hand hardware.
|
Garage is meant to work on old, second-hand hardware.
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
+++
|
+++
|
||||||
title = "Upgrading Garage"
|
title = "Upgrading Garage"
|
||||||
weight = 60
|
weight = 70
|
||||||
+++
|
+++
|
||||||
|
|
||||||
Garage is a stateful clustered application, where all nodes are communicating together and share data structures.
|
Garage is a stateful clustered application, where all nodes are communicating together and share data structures.
|
||||||
|
|
Loading…
Reference in a new issue