[db-snapshot] documentation for metadata db snapshots
This commit is contained in:
parent
a68c37555d
commit
8cf3d24875
5 changed files with 114 additions and 7 deletions
|
@ -72,13 +72,14 @@ to store 2 TB of data in total.
|
||||||
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
|
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
|
||||||
|
|
||||||
- For the metadata storage, Garage does not do checksumming and integrity
|
- For the metadata storage, Garage does not do checksumming and integrity
|
||||||
verification on its own. Users have reported that when using the LMDB
|
verification on its own, so it is better to use a robust filesystem such as
|
||||||
database engine (the default), database files have a tendency of becoming
|
BTRFS or ZFS. Users have reported that when using the LMDB database engine
|
||||||
corrupted after an unclean shutdown (e.g. a power outage), so you should use
|
(the default), database files have a tendency of becoming corrupted after an
|
||||||
a robust filesystem such as BTRFS or ZFS for the metadata partition, and take
|
unclean shutdown (e.g. a power outage), so you should take regular snapshots
|
||||||
regular snapshots so that you can restore to a recent known-good state in
|
to be able to recover from such a situation. This can be done using Garage's
|
||||||
case of an incident. If you cannot do so, you might want to switch to Sqlite
|
built-in automatic snapshotting (since v0.9.4), or by using filesystem level
|
||||||
which is more robust.
|
snapshots. If you cannot do so, you might want to switch to Sqlite which is
|
||||||
|
more robust.
|
||||||
|
|
||||||
- LMDB is the fastest and most tested database engine, but it has the following
|
- LMDB is the fastest and most tested database engine, but it has the following
|
||||||
weaknesses: 1/ data files are not architecture-independent, you cannot simply
|
weaknesses: 1/ data files are not architecture-independent, you cannot simply
|
||||||
|
@ -124,6 +125,7 @@ A valid `/etc/garage.toml` for our cluster would look as follows:
|
||||||
metadata_dir = "/var/lib/garage/meta"
|
metadata_dir = "/var/lib/garage/meta"
|
||||||
data_dir = "/var/lib/garage/data"
|
data_dir = "/var/lib/garage/data"
|
||||||
db_engine = "lmdb"
|
db_engine = "lmdb"
|
||||||
|
metadata_auto_snapshot_interval = "6h"
|
||||||
|
|
||||||
replication_mode = "3"
|
replication_mode = "3"
|
||||||
|
|
||||||
|
|
|
@ -104,6 +104,24 @@ operation will also move out all data from locations marked as read-only.
|
||||||
|
|
||||||
# Metadata operations
|
# Metadata operations
|
||||||
|
|
||||||
|
## Metadata snapshotting
|
||||||
|
|
||||||
|
It is good practice to setup automatic snapshotting of your metadata database
|
||||||
|
file, to recover from situations where it becomes corrupted on disk. This can
|
||||||
|
be done at the filesystem level if you are using ZFS or BTRFS.
|
||||||
|
|
||||||
|
Since Garage v0.9.4, Garage is able to take snapshots of the metadata database
|
||||||
|
itself. This basically amounts to copying the database file, except that it can
|
||||||
|
be run live while Garage is running without the risk of corruption or
|
||||||
|
inconsistencies. This can be setup to run automatically on a schedule using
|
||||||
|
[`metadata_auto_snapshot_interval`](@/documentation/reference-manual/configuration.md#metadata_auto_snapshot_interval).
|
||||||
|
A snapshot can also be triggered manually using the `garage meta snapshot`
|
||||||
|
command. Note that taking a snapshot using this method is very intensive as it
|
||||||
|
requires making a full copy of the database file, so you might prefer using
|
||||||
|
filesystem-level snapshots if possible. To recover a corrupted node from such a
|
||||||
|
snapshot, read the instructions
|
||||||
|
[here](@/documentation/operations/recovering.md#corrupted_meta).
|
||||||
|
|
||||||
## Metadata table resync
|
## Metadata table resync
|
||||||
|
|
||||||
Garage automatically resyncs all entries stored in the metadata tables every hour,
|
Garage automatically resyncs all entries stored in the metadata tables every hour,
|
||||||
|
|
|
@ -108,3 +108,57 @@ garage layout apply # once satisfied, apply the changes
|
||||||
|
|
||||||
Garage will then start synchronizing all required data on the new node.
|
Garage will then start synchronizing all required data on the new node.
|
||||||
This process can be monitored using the `garage stats -a` command.
|
This process can be monitored using the `garage stats -a` command.
|
||||||
|
|
||||||
|
## Replacement scenario 3: corrupted metadata {#corrupted_meta}
|
||||||
|
|
||||||
|
In some cases, your metadata DB file might become corrupted, for instance if
|
||||||
|
your node suffered a power outage and did not shut down properly. In this case,
|
||||||
|
you can recover without having to change the node ID and rebuilding a cluster
|
||||||
|
layout. This means that data blocks will not need to be shuffled around, you
|
||||||
|
must simply find a way to repair the metadata file. The best way is generally
|
||||||
|
to discard the corrupted file and recover it from another source.
|
||||||
|
|
||||||
|
First of all, start by locating the database file in your metadata directory,
|
||||||
|
which [depends on your `db_engine`
|
||||||
|
choice](@/documentation/reference-manual/configuration.md#db_engine). Then,
|
||||||
|
your recovery options are as follows:
|
||||||
|
|
||||||
|
- **Option 1: resyncing from other nodes.** In case your cluster is replicated
|
||||||
|
with two or three copies, you can simply delete the database file, and Garage
|
||||||
|
will resync from other nodes. To do so, stop Garage, delete the database file
|
||||||
|
or directory, and restart Garage. Then, do a full table repair by calling
|
||||||
|
`garage repair -a --yes tables`. This will take a bit of time to complete as
|
||||||
|
the new node will need to receive copies of the metadata tables from the
|
||||||
|
network.
|
||||||
|
|
||||||
|
- **Option 2: restoring a snapshot taken by Garage.** Since v0.9.4, Garage can
|
||||||
|
[automatically take regular
|
||||||
|
snapshots](@/documentation/reference-manual/configuration.md#metadata_auto_snapshot_interval)
|
||||||
|
of your metadata DB file. This file or directory should be located under
|
||||||
|
`<metadata_dir>/snapshots`, and is named according to the UTC time at which it
|
||||||
|
was taken. Stop Garage, discard the database file/directory and replace it by the
|
||||||
|
snapshot you want to use. For instance, in the case of LMDB:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd $METADATA_DIR
|
||||||
|
mv db.lmdb db.lmdb.bak
|
||||||
|
cp -r snapshots/2024-03-15T12:13:52Z db.lmdb
|
||||||
|
```
|
||||||
|
|
||||||
|
And for Sqlite:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd $METADATA_DIR
|
||||||
|
mv db.sqlite db.sqlite.bak
|
||||||
|
cp snapshots/2024-03-15T12:13:52Z db.sqlite
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, restart Garage and run a full table repair by calling `garage repair -a
|
||||||
|
--yes tables`. This should run relatively fast as only the changes that
|
||||||
|
occurred since the snapshot was taken will need to be resynchronized. Of
|
||||||
|
course, if your cluster is not replicated, you will lose all changes that
|
||||||
|
occurred since the snapshot was taken.
|
||||||
|
|
||||||
|
- **Option 3: restoring a filesystem-level snapshot.** If you are using ZFS or
|
||||||
|
BTRFS to snapshot your metadata partition, refer to their specific
|
||||||
|
documentation on rolling back or copying files from an old snapshot.
|
||||||
|
|
|
@ -73,6 +73,18 @@ The entire procedure would look something like this:
|
||||||
You can do all of the nodes in a single zone at once as that won't impact global cluster availability.
|
You can do all of the nodes in a single zone at once as that won't impact global cluster availability.
|
||||||
Do not try to make a backup of the metadata folder of a running node.
|
Do not try to make a backup of the metadata folder of a running node.
|
||||||
|
|
||||||
|
**Since Garage v0.9.4,** you can use the `garage meta snapshot --all` command
|
||||||
|
to take a simultaneous snapshot of the metadata database files of all your
|
||||||
|
nodes. This avoids the tedious process of having to take them down one by
|
||||||
|
one before upgrading. Be careful that if automatic snapshotting is enabled,
|
||||||
|
Garage only keeps the last two snapshots and deletes older ones, so you might
|
||||||
|
want to disable automatic snapshotting in your upgraded configuration file
|
||||||
|
until you have confirmed that the upgrade ran successfully. In addition to
|
||||||
|
snapshotting the metadata databases of your nodes, you should back-up at
|
||||||
|
least the `cluster_layout` file of one of your Garage instances (this file
|
||||||
|
should be the same on all nodes and you can copy it safely while Garage is
|
||||||
|
running).
|
||||||
|
|
||||||
3. Prepare your binaries and configuration files for the new Garage version
|
3. Prepare your binaries and configuration files for the new Garage version
|
||||||
|
|
||||||
4. Restart all nodes simultaneously in the new version
|
4. Restart all nodes simultaneously in the new version
|
||||||
|
|
|
@ -15,6 +15,7 @@ data_dir = "/var/lib/garage/data"
|
||||||
metadata_fsync = true
|
metadata_fsync = true
|
||||||
data_fsync = false
|
data_fsync = false
|
||||||
disable_scrub = false
|
disable_scrub = false
|
||||||
|
metadata_auto_snapshot_interval = "6h"
|
||||||
|
|
||||||
db_engine = "lmdb"
|
db_engine = "lmdb"
|
||||||
|
|
||||||
|
@ -90,6 +91,7 @@ Top-level configuration options:
|
||||||
[`db_engine`](#db_engine),
|
[`db_engine`](#db_engine),
|
||||||
[`disable_scrub`](#disable_scrub),
|
[`disable_scrub`](#disable_scrub),
|
||||||
[`lmdb_map_size`](#lmdb_map_size),
|
[`lmdb_map_size`](#lmdb_map_size),
|
||||||
|
[`metadata_auto_snapshot_interval`](#metadata_auto_snapshot_interval),
|
||||||
[`metadata_dir`](#metadata_dir),
|
[`metadata_dir`](#metadata_dir),
|
||||||
[`metadata_fsync`](#metadata_fsync),
|
[`metadata_fsync`](#metadata_fsync),
|
||||||
[`replication_mode`](#replication_mode),
|
[`replication_mode`](#replication_mode),
|
||||||
|
@ -346,6 +348,25 @@ at the cost of a moderate drop in write performance.
|
||||||
Similarly to `metatada_fsync`, this is likely not necessary
|
Similarly to `metatada_fsync`, this is likely not necessary
|
||||||
if geographical replication is used.
|
if geographical replication is used.
|
||||||
|
|
||||||
|
#### `metadata_auto_snapshot_interval` (since Garage v0.9.4) {#metadata_auto_snapshot_interval}
|
||||||
|
|
||||||
|
If this value is set, Garage will automatically take a snapshot of the metadata
|
||||||
|
DB file at a regular interval and save it in the metadata directory.
|
||||||
|
This can allow to recover from situations where the metadata DB file is corrupted,
|
||||||
|
for instance after an unclean shutdown.
|
||||||
|
See [this page](@/documentation/operations/recovering.md#corrupted_meta) for details.
|
||||||
|
|
||||||
|
Garage keeps only the two most recent snapshots of the metadata DB and deletes
|
||||||
|
older ones automatically.
|
||||||
|
|
||||||
|
Note that taking a metadata snapshot is a relatively intensive operation as the
|
||||||
|
entire data file is copied. A snapshot being taken might have performance
|
||||||
|
impacts on the Garage node while it is running. If the cluster is under heavy
|
||||||
|
write load when a snapshot operation is running, this might also cause the
|
||||||
|
database file to grow in size significantly as pages cannot be recycled easily.
|
||||||
|
For this reason, it might be better to use filesystem-level snapshots instead
|
||||||
|
if possible.
|
||||||
|
|
||||||
#### `disable_scrub` {#disable_scrub}
|
#### `disable_scrub` {#disable_scrub}
|
||||||
|
|
||||||
By default, Garage runs a scrub of the data directory approximately once per
|
By default, Garage runs a scrub of the data directory approximately once per
|
||||||
|
|
Loading…
Reference in a new issue