Doc: be slightly more critical of LMDB #773
4 changed files with 55 additions and 33 deletions
|
@ -91,5 +91,5 @@ The following feature flags are available in v0.8.0:
|
||||||
| `metrics` | *by default* | Enable collection of metrics in Prometheus format on the admin API |
|
| `metrics` | *by default* | Enable collection of metrics in Prometheus format on the admin API |
|
||||||
| `telemetry-otlp` | optional | Enable collection of execution traces using OpenTelemetry |
|
| `telemetry-otlp` | optional | Enable collection of execution traces using OpenTelemetry |
|
||||||
| `sled` | *by default* | Enable using Sled to store Garage's metadata |
|
| `sled` | *by default* | Enable using Sled to store Garage's metadata |
|
||||||
| `lmdb` | optional | Enable using LMDB to store Garage's metadata |
|
| `lmdb` | *by default* | Enable using LMDB to store Garage's metadata |
|
||||||
| `sqlite` | optional | Enable using Sqlite3 to store Garage's metadata |
|
| `sqlite` | *by default* | Enable using Sqlite3 to store Garage's metadata |
|
||||||
|
|
|
@ -53,9 +53,9 @@ to store 2 TB of data in total.
|
||||||
|
|
||||||
### Best practices
|
### Best practices
|
||||||
|
|
||||||
- If you have fast dedicated networking between all your nodes, and are planing to store
|
- If you have reasonably fast networking between all your nodes, and are planing to store
|
||||||
very large files, bump the `block_size` configuration parameter to 10 MB
|
mostly large files, bump the `block_size` configuration parameter to 10 MB
|
||||||
(`block_size = 10485760`).
|
(`block_size = "10M"`).
|
||||||
|
|
||||||
- Garage stores its files in two locations: it uses a metadata directory to store frequently-accessed
|
- Garage stores its files in two locations: it uses a metadata directory to store frequently-accessed
|
||||||
small metadata items, and a data directory to store data blocks of uploaded objects.
|
small metadata items, and a data directory to store data blocks of uploaded objects.
|
||||||
|
@ -68,20 +68,29 @@ to store 2 TB of data in total.
|
||||||
EXT4 is not recommended as it has more strict limitations on the number of inodes,
|
EXT4 is not recommended as it has more strict limitations on the number of inodes,
|
||||||
which might cause issues with Garage when large numbers of objects are stored.
|
which might cause issues with Garage when large numbers of objects are stored.
|
||||||
|
|
||||||
- If you only have an HDD and no SSD, it's fine to put your metadata alongside the data
|
|
||||||
on the same drive. Having lots of RAM for your kernel to cache the metadata will
|
|
||||||
help a lot with performance. Make sure to use the LMDB database engine,
|
|
||||||
instead of Sled, which suffers from quite bad performance degradation on HDDs.
|
|
||||||
Sled is still the default for legacy reasons, but is not recommended anymore.
|
|
||||||
|
|
||||||
- For the metadata storage, Garage does not do checksumming and integrity
|
|
||||||
verification on its own. If you are afraid of bitrot/data corruption,
|
|
||||||
put your metadata directory on a ZFS or BTRFS partition. Otherwise, just use regular
|
|
||||||
EXT4 or XFS.
|
|
||||||
|
|
||||||
- Servers with multiple HDDs are supported natively by Garage without resorting
|
- Servers with multiple HDDs are supported natively by Garage without resorting
|
||||||
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
|
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
|
||||||
|
|
||||||
|
- For the metadata storage, Garage does not do checksumming and integrity
|
||||||
|
verification on its own. Users have reported that when using the LMDB
|
||||||
|
database engine (the default), database files have a tendency of becoming
|
||||||
|
corrupted after an unclean shutdown (e.g. a power outage), so you should use
|
||||||
|
a robust filesystem such as BTRFS or ZFS for the metadata partition, and take
|
||||||
|
regular snapshots so that you can restore to a recent known-good state in
|
||||||
|
case of an incident. If you cannot do so, you might want to switch to Sqlite
|
||||||
|
which is more robust.
|
||||||
|
|
||||||
|
- LMDB is the fastest and most tested database engine, but it has the following
|
||||||
|
weaknesses: 1/ data files are not architecture-independent, you cannot simply
|
||||||
|
move a Garage metadata directory between nodes running different architectures,
|
||||||
|
and 2/ LMDB is not suited for 32-bit platforms. Sqlite is a viable alternative
|
||||||
|
if any of these are of concern.
|
||||||
|
|
||||||
|
- If you only have an HDD and no SSD, it's fine to put your metadata alongside
|
||||||
|
the data on the same drive, but then consider your filesystem choice wisely
|
||||||
|
(see above). Having lots of RAM for your kernel to cache the metadata will
|
||||||
|
help a lot with performance.
|
||||||
|
|
||||||
## Get a Docker image
|
## Get a Docker image
|
||||||
|
|
||||||
Our docker image is currently named `dxflrs/garage` and is stored on the [Docker Hub](https://hub.docker.com/r/dxflrs/garage/tags?page=1&ordering=last_updated).
|
Our docker image is currently named `dxflrs/garage` and is stored on the [Docker Hub](https://hub.docker.com/r/dxflrs/garage/tags?page=1&ordering=last_updated).
|
||||||
|
@ -187,7 +196,7 @@ upgrades. With the containerized setup proposed here, the upgrade process
|
||||||
will require stopping and removing the existing container, and re-creating it
|
will require stopping and removing the existing container, and re-creating it
|
||||||
with the upgraded version.
|
with the upgraded version.
|
||||||
|
|
||||||
## Controling the daemon
|
## Controlling the daemon
|
||||||
|
|
||||||
The `garage` binary has two purposes:
|
The `garage` binary has two purposes:
|
||||||
- it acts as a daemon when launched with `garage server`
|
- it acts as a daemon when launched with `garage server`
|
||||||
|
@ -245,7 +254,7 @@ You can then instruct nodes to connect to one another as follows:
|
||||||
Venus$ garage node connect 563e1ac825ee3323aa441e72c26d1030d6d4414aeb3dd25287c531e7fc2bc95d@[fc00:1::1]:3901
|
Venus$ garage node connect 563e1ac825ee3323aa441e72c26d1030d6d4414aeb3dd25287c531e7fc2bc95d@[fc00:1::1]:3901
|
||||||
```
|
```
|
||||||
|
|
||||||
You don't nead to instruct all node to connect to all other nodes:
|
You don't need to instruct all node to connect to all other nodes:
|
||||||
nodes will discover one another transitively.
|
nodes will discover one another transitively.
|
||||||
|
|
||||||
Now if your run `garage status` on any node, you should have an output that looks as follows:
|
Now if your run `garage status` on any node, you should have an output that looks as follows:
|
||||||
|
|
|
@ -57,7 +57,7 @@ to generate unique and private secrets for security reasons:
|
||||||
cat > garage.toml <<EOF
|
cat > garage.toml <<EOF
|
||||||
metadata_dir = "/tmp/meta"
|
metadata_dir = "/tmp/meta"
|
||||||
data_dir = "/tmp/data"
|
data_dir = "/tmp/data"
|
||||||
db_engine = "lmdb"
|
db_engine = "sqlite"
|
||||||
|
|
||||||
replication_mode = "none"
|
replication_mode = "none"
|
||||||
|
|
||||||
|
|
|
@ -264,18 +264,31 @@ Performance characteristics of the different DB engines are as follows:
|
||||||
- Sled: tends to produce large data files and also has performance issues,
|
- Sled: tends to produce large data files and also has performance issues,
|
||||||
especially when the metadata folder is on a traditional HDD and not on SSD.
|
especially when the metadata folder is on a traditional HDD and not on SSD.
|
||||||
|
|
||||||
- LMDB: the recommended database engine on 64-bit systems, much more
|
- LMDB: the recommended database engine for high-performance distributed
|
||||||
space-efficient and slightly faster. Note that the data format of LMDB is not
|
clusters, much more space-efficient and significantly faster. LMDB works very
|
||||||
portable between architectures, so for instance the Garage database of an
|
well, but is known to have the following limitations:
|
||||||
x86-64 node cannot be moved to an ARM64 node. Also note that, while LMDB can
|
|
||||||
technically be used on 32-bit systems, this will limit your node to very
|
- The data format of LMDB is not portable between architectures, so for
|
||||||
small database sizes due to how LMDB works; it is therefore not recommended.
|
instance the Garage database of an x86-64 node cannot be moved to an ARM64
|
||||||
|
node.
|
||||||
|
|
||||||
|
- While LMDB can technically be used on 32-bit systems, this will limit your
|
||||||
|
node to very small database sizes due to how LMDB works; it is therefore
|
||||||
|
not recommended.
|
||||||
|
|
||||||
|
- Several users have reported corrupted LMDB database files after an unclean
|
||||||
|
shutdown (e.g. a power outage). This situation can generally be recovered
|
||||||
|
from if your cluster is geo-replicated (by rebuilding your metadata db from
|
||||||
|
other nodes), or if you have saved regular snapshots at the filesystem
|
||||||
|
level.
|
||||||
|
|
||||||
- Sqlite: Garage supports Sqlite as an alternative storage backend for
|
- Sqlite: Garage supports Sqlite as an alternative storage backend for
|
||||||
metadata, and although it has not been tested as much, it is expected to work
|
metadata, which does not have the issues listed above for LMDB.
|
||||||
satisfactorily. Since Garage v0.9.0, performance issues have largely been
|
On versions 0.8.x and earlier, Sqlite should be avoided due to abysmal
|
||||||
fixed by allowing for a no-fsync mode (see `metadata_fsync`). Sqlite does not
|
performance, which was fixed with the addition of `metadata_fsync`.
|
||||||
have the database size limitation of LMDB on 32-bit systems.
|
Sqlite is still probably slower than LMDB due to the way we use it,
|
||||||
|
so it is not the best choice for high-performance storage clusters,
|
||||||
|
but it should work fine in many cases.
|
||||||
|
|
||||||
It is possible to convert Garage's metadata directory from one format to another
|
It is possible to convert Garage's metadata directory from one format to another
|
||||||
using the `garage convert-db` command, which should be used as follows:
|
using the `garage convert-db` command, which should be used as follows:
|
||||||
|
|
Loading…
Reference in a new issue