[db-snapshot] documentation for metadata db snapshots
This commit is contained in:
parent
a68c37555d
commit
8cf3d24875
5 changed files with 114 additions and 7 deletions
|
@ -72,13 +72,14 @@ to store 2 TB of data in total.
|
|||
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
|
||||
|
||||
- For the metadata storage, Garage does not do checksumming and integrity
|
||||
verification on its own. Users have reported that when using the LMDB
|
||||
database engine (the default), database files have a tendency of becoming
|
||||
corrupted after an unclean shutdown (e.g. a power outage), so you should use
|
||||
a robust filesystem such as BTRFS or ZFS for the metadata partition, and take
|
||||
regular snapshots so that you can restore to a recent known-good state in
|
||||
case of an incident. If you cannot do so, you might want to switch to Sqlite
|
||||
which is more robust.
|
||||
verification on its own, so it is better to use a robust filesystem such as
|
||||
BTRFS or ZFS. Users have reported that when using the LMDB database engine
|
||||
(the default), database files have a tendency of becoming corrupted after an
|
||||
unclean shutdown (e.g. a power outage), so you should take regular snapshots
|
||||
to be able to recover from such a situation. This can be done using Garage's
|
||||
built-in automatic snapshotting (since v0.9.4), or by using filesystem level
|
||||
snapshots. If you cannot do so, you might want to switch to Sqlite which is
|
||||
more robust.
|
||||
|
||||
- LMDB is the fastest and most tested database engine, but it has the following
|
||||
weaknesses: 1/ data files are not architecture-independent, you cannot simply
|
||||
|
@ -124,6 +125,7 @@ A valid `/etc/garage.toml` for our cluster would look as follows:
|
|||
metadata_dir = "/var/lib/garage/meta"
|
||||
data_dir = "/var/lib/garage/data"
|
||||
db_engine = "lmdb"
|
||||
metadata_auto_snapshot_interval = "6h"
|
||||
|
||||
replication_mode = "3"
|
||||
|
||||
|
|
|
@ -104,6 +104,24 @@ operation will also move out all data from locations marked as read-only.
|
|||
|
||||
# Metadata operations
|
||||
|
||||
## Metadata snapshotting
|
||||
|
||||
It is good practice to setup automatic snapshotting of your metadata database
|
||||
file, to recover from situations where it becomes corrupted on disk. This can
|
||||
be done at the filesystem level if you are using ZFS or BTRFS.
|
||||
|
||||
Since Garage v0.9.4, Garage is able to take snapshots of the metadata database
|
||||
itself. This basically amounts to copying the database file, except that it can
|
||||
be run live while Garage is running without the risk of corruption or
|
||||
inconsistencies. This can be setup to run automatically on a schedule using
|
||||
[`metadata_auto_snapshot_interval`](@/documentation/reference-manual/configuration.md#metadata_auto_snapshot_interval).
|
||||
A snapshot can also be triggered manually using the `garage meta snapshot`
|
||||
command. Note that taking a snapshot using this method is very intensive as it
|
||||
requires making a full copy of the database file, so you might prefer using
|
||||
filesystem-level snapshots if possible. To recover a corrupted node from such a
|
||||
snapshot, read the instructions
|
||||
[here](@/documentation/operations/recovering.md#corrupted_meta).
|
||||
|
||||
## Metadata table resync
|
||||
|
||||
Garage automatically resyncs all entries stored in the metadata tables every hour,
|
||||
|
|
|
@ -108,3 +108,57 @@ garage layout apply # once satisfied, apply the changes
|
|||
|
||||
Garage will then start synchronizing all required data on the new node.
|
||||
This process can be monitored using the `garage stats -a` command.
|
||||
|
||||
## Replacement scenario 3: corrupted metadata {#corrupted_meta}
|
||||
|
||||
In some cases, your metadata DB file might become corrupted, for instance if
|
||||
your node suffered a power outage and did not shut down properly. In this case,
|
||||
you can recover without having to change the node ID and rebuilding a cluster
|
||||
layout. This means that data blocks will not need to be shuffled around, you
|
||||
must simply find a way to repair the metadata file. The best way is generally
|
||||
to discard the corrupted file and recover it from another source.
|
||||
|
||||
First of all, start by locating the database file in your metadata directory,
|
||||
which [depends on your `db_engine`
|
||||
choice](@/documentation/reference-manual/configuration.md#db_engine). Then,
|
||||
your recovery options are as follows:
|
||||
|
||||
- **Option 1: resyncing from other nodes.** In case your cluster is replicated
|
||||
with two or three copies, you can simply delete the database file, and Garage
|
||||
will resync from other nodes. To do so, stop Garage, delete the database file
|
||||
or directory, and restart Garage. Then, do a full table repair by calling
|
||||
`garage repair -a --yes tables`. This will take a bit of time to complete as
|
||||
the new node will need to receive copies of the metadata tables from the
|
||||
network.
|
||||
|
||||
- **Option 2: restoring a snapshot taken by Garage.** Since v0.9.4, Garage can
|
||||
[automatically take regular
|
||||
snapshots](@/documentation/reference-manual/configuration.md#metadata_auto_snapshot_interval)
|
||||
of your metadata DB file. This file or directory should be located under
|
||||
`<metadata_dir>/snapshots`, and is named according to the UTC time at which it
|
||||
was taken. Stop Garage, discard the database file/directory and replace it by the
|
||||
snapshot you want to use. For instance, in the case of LMDB:
|
||||
|
||||
```bash
|
||||
cd $METADATA_DIR
|
||||
mv db.lmdb db.lmdb.bak
|
||||
cp -r snapshots/2024-03-15T12:13:52Z db.lmdb
|
||||
```
|
||||
|
||||
And for Sqlite:
|
||||
|
||||
```bash
|
||||
cd $METADATA_DIR
|
||||
mv db.sqlite db.sqlite.bak
|
||||
cp snapshots/2024-03-15T12:13:52Z db.sqlite
|
||||
```
|
||||
|
||||
Then, restart Garage and run a full table repair by calling `garage repair -a
|
||||
--yes tables`. This should run relatively fast as only the changes that
|
||||
occurred since the snapshot was taken will need to be resynchronized. Of
|
||||
course, if your cluster is not replicated, you will lose all changes that
|
||||
occurred since the snapshot was taken.
|
||||
|
||||
- **Option 3: restoring a filesystem-level snapshot.** If you are using ZFS or
|
||||
BTRFS to snapshot your metadata partition, refer to their specific
|
||||
documentation on rolling back or copying files from an old snapshot.
|
||||
|
|
|
@ -73,6 +73,18 @@ The entire procedure would look something like this:
|
|||
You can do all of the nodes in a single zone at once as that won't impact global cluster availability.
|
||||
Do not try to make a backup of the metadata folder of a running node.
|
||||
|
||||
**Since Garage v0.9.4,** you can use the `garage meta snapshot --all` command
|
||||
to take a simultaneous snapshot of the metadata database files of all your
|
||||
nodes. This avoids the tedious process of having to take them down one by
|
||||
one before upgrading. Be careful that if automatic snapshotting is enabled,
|
||||
Garage only keeps the last two snapshots and deletes older ones, so you might
|
||||
want to disable automatic snapshotting in your upgraded configuration file
|
||||
until you have confirmed that the upgrade ran successfully. In addition to
|
||||
snapshotting the metadata databases of your nodes, you should back-up at
|
||||
least the `cluster_layout` file of one of your Garage instances (this file
|
||||
should be the same on all nodes and you can copy it safely while Garage is
|
||||
running).
|
||||
|
||||
3. Prepare your binaries and configuration files for the new Garage version
|
||||
|
||||
4. Restart all nodes simultaneously in the new version
|
||||
|
|
|
@ -15,6 +15,7 @@ data_dir = "/var/lib/garage/data"
|
|||
metadata_fsync = true
|
||||
data_fsync = false
|
||||
disable_scrub = false
|
||||
metadata_auto_snapshot_interval = "6h"
|
||||
|
||||
db_engine = "lmdb"
|
||||
|
||||
|
@ -90,6 +91,7 @@ Top-level configuration options:
|
|||
[`db_engine`](#db_engine),
|
||||
[`disable_scrub`](#disable_scrub),
|
||||
[`lmdb_map_size`](#lmdb_map_size),
|
||||
[`metadata_auto_snapshot_interval`](#metadata_auto_snapshot_interval),
|
||||
[`metadata_dir`](#metadata_dir),
|
||||
[`metadata_fsync`](#metadata_fsync),
|
||||
[`replication_mode`](#replication_mode),
|
||||
|
@ -346,6 +348,25 @@ at the cost of a moderate drop in write performance.
|
|||
Similarly to `metatada_fsync`, this is likely not necessary
|
||||
if geographical replication is used.
|
||||
|
||||
#### `metadata_auto_snapshot_interval` (since Garage v0.9.4) {#metadata_auto_snapshot_interval}
|
||||
|
||||
If this value is set, Garage will automatically take a snapshot of the metadata
|
||||
DB file at a regular interval and save it in the metadata directory.
|
||||
This can allow to recover from situations where the metadata DB file is corrupted,
|
||||
for instance after an unclean shutdown.
|
||||
See [this page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
||||
|
||||
Garage keeps only the two most recent snapshots of the metadata DB and deletes
|
||||
older ones automatically.
|
||||
|
||||
Note that taking a metadata snapshot is a relatively intensive operation as the
|
||||
entire data file is copied. A snapshot being taken might have performance
|
||||
impacts on the Garage node while it is running. If the cluster is under heavy
|
||||
write load when a snapshot operation is running, this might also cause the
|
||||
database file to grow in size significantly as pages cannot be recycled easily.
|
||||
For this reason, it might be better to use filesystem-level snapshots instead
|
||||
if possible.
|
||||
|
||||
#### `disable_scrub` {#disable_scrub}
|
||||
|
||||
By default, Garage runs a scrub of the data directory approximately once per
|
||||
|
|
Loading…
Reference in a new issue