Merge pull request 'Doc: be slightly more critical of LMDB' (#773) from doc-updates into main
All checks were successful
ci/woodpecker/push/debug Pipeline was successful
ci/woodpecker/cron/debug Pipeline was successful
ci/woodpecker/cron/release/4 Pipeline was successful
ci/woodpecker/cron/release/3 Pipeline was successful
ci/woodpecker/cron/release/2 Pipeline was successful
ci/woodpecker/cron/release/1 Pipeline was successful
ci/woodpecker/cron/publish Pipeline was successful
All checks were successful
ci/woodpecker/push/debug Pipeline was successful
ci/woodpecker/cron/debug Pipeline was successful
ci/woodpecker/cron/release/4 Pipeline was successful
ci/woodpecker/cron/release/3 Pipeline was successful
ci/woodpecker/cron/release/2 Pipeline was successful
ci/woodpecker/cron/release/1 Pipeline was successful
ci/woodpecker/cron/publish Pipeline was successful
Reviewed-on: #773
This commit is contained in:
commit
62b01d8705
4 changed files with 55 additions and 33 deletions
|
@ -91,5 +91,5 @@ The following feature flags are available in v0.8.0:
|
|||
| `metrics` | *by default* | Enable collection of metrics in Prometheus format on the admin API |
|
||||
| `telemetry-otlp` | optional | Enable collection of execution traces using OpenTelemetry |
|
||||
| `sled` | *by default* | Enable using Sled to store Garage's metadata |
|
||||
| `lmdb` | optional | Enable using LMDB to store Garage's metadata |
|
||||
| `sqlite` | optional | Enable using Sqlite3 to store Garage's metadata |
|
||||
| `lmdb` | *by default* | Enable using LMDB to store Garage's metadata |
|
||||
| `sqlite` | *by default* | Enable using Sqlite3 to store Garage's metadata |
|
||||
|
|
|
@ -27,7 +27,7 @@ To run a real-world deployment, make sure the following conditions are met:
|
|||
[Yggdrasil](https://yggdrasil-network.github.io/) are approaches to consider
|
||||
in addition to building out your own VPN tunneling.
|
||||
|
||||
- This guide will assume you are using Docker containers to deploy Garage on each node.
|
||||
- This guide will assume you are using Docker containers to deploy Garage on each node.
|
||||
Garage can also be run independently, for instance as a [Systemd service](@/documentation/cookbook/systemd.md).
|
||||
You can also use an orchestrator such as Nomad or Kubernetes to automatically manage
|
||||
Docker containers on a fleet of nodes.
|
||||
|
@ -53,9 +53,9 @@ to store 2 TB of data in total.
|
|||
|
||||
### Best practices
|
||||
|
||||
- If you have fast dedicated networking between all your nodes, and are planing to store
|
||||
very large files, bump the `block_size` configuration parameter to 10 MB
|
||||
(`block_size = 10485760`).
|
||||
- If you have reasonably fast networking between all your nodes, and are planing to store
|
||||
mostly large files, bump the `block_size` configuration parameter to 10 MB
|
||||
(`block_size = "10M"`).
|
||||
|
||||
- Garage stores its files in two locations: it uses a metadata directory to store frequently-accessed
|
||||
small metadata items, and a data directory to store data blocks of uploaded objects.
|
||||
|
@ -68,20 +68,29 @@ to store 2 TB of data in total.
|
|||
EXT4 is not recommended as it has more strict limitations on the number of inodes,
|
||||
which might cause issues with Garage when large numbers of objects are stored.
|
||||
|
||||
- If you only have an HDD and no SSD, it's fine to put your metadata alongside the data
|
||||
on the same drive. Having lots of RAM for your kernel to cache the metadata will
|
||||
help a lot with performance. Make sure to use the LMDB database engine,
|
||||
instead of Sled, which suffers from quite bad performance degradation on HDDs.
|
||||
Sled is still the default for legacy reasons, but is not recommended anymore.
|
||||
|
||||
- For the metadata storage, Garage does not do checksumming and integrity
|
||||
verification on its own. If you are afraid of bitrot/data corruption,
|
||||
put your metadata directory on a ZFS or BTRFS partition. Otherwise, just use regular
|
||||
EXT4 or XFS.
|
||||
|
||||
- Servers with multiple HDDs are supported natively by Garage without resorting
|
||||
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
|
||||
|
||||
- For the metadata storage, Garage does not do checksumming and integrity
|
||||
verification on its own. Users have reported that when using the LMDB
|
||||
database engine (the default), database files have a tendency of becoming
|
||||
corrupted after an unclean shutdown (e.g. a power outage), so you should use
|
||||
a robust filesystem such as BTRFS or ZFS for the metadata partition, and take
|
||||
regular snapshots so that you can restore to a recent known-good state in
|
||||
case of an incident. If you cannot do so, you might want to switch to Sqlite
|
||||
which is more robust.
|
||||
|
||||
- LMDB is the fastest and most tested database engine, but it has the following
|
||||
weaknesses: 1/ data files are not architecture-independent, you cannot simply
|
||||
move a Garage metadata directory between nodes running different architectures,
|
||||
and 2/ LMDB is not suited for 32-bit platforms. Sqlite is a viable alternative
|
||||
if any of these are of concern.
|
||||
|
||||
- If you only have an HDD and no SSD, it's fine to put your metadata alongside
|
||||
the data on the same drive, but then consider your filesystem choice wisely
|
||||
(see above). Having lots of RAM for your kernel to cache the metadata will
|
||||
help a lot with performance.
|
||||
|
||||
## Get a Docker image
|
||||
|
||||
Our docker image is currently named `dxflrs/garage` and is stored on the [Docker Hub](https://hub.docker.com/r/dxflrs/garage/tags?page=1&ordering=last_updated).
|
||||
|
@ -187,7 +196,7 @@ upgrades. With the containerized setup proposed here, the upgrade process
|
|||
will require stopping and removing the existing container, and re-creating it
|
||||
with the upgraded version.
|
||||
|
||||
## Controling the daemon
|
||||
## Controlling the daemon
|
||||
|
||||
The `garage` binary has two purposes:
|
||||
- it acts as a daemon when launched with `garage server`
|
||||
|
@ -245,7 +254,7 @@ You can then instruct nodes to connect to one another as follows:
|
|||
Venus$ garage node connect 563e1ac825ee3323aa441e72c26d1030d6d4414aeb3dd25287c531e7fc2bc95d@[fc00:1::1]:3901
|
||||
```
|
||||
|
||||
You don't nead to instruct all node to connect to all other nodes:
|
||||
You don't need to instruct all node to connect to all other nodes:
|
||||
nodes will discover one another transitively.
|
||||
|
||||
Now if your run `garage status` on any node, you should have an output that looks as follows:
|
||||
|
@ -328,8 +337,8 @@ Given the information above, we will configure our cluster as follow:
|
|||
```bash
|
||||
garage layout assign 563e -z par1 -c 1T -t mercury
|
||||
garage layout assign 86f0 -z par1 -c 2T -t venus
|
||||
garage layout assign 6814 -z lon1 -c 2T -t earth
|
||||
garage layout assign 212f -z bru1 -c 1.5T -t mars
|
||||
garage layout assign 6814 -z lon1 -c 2T -t earth
|
||||
garage layout assign 212f -z bru1 -c 1.5T -t mars
|
||||
```
|
||||
|
||||
At this point, the changes in the cluster layout have not yet been applied.
|
||||
|
|
|
@ -57,7 +57,7 @@ to generate unique and private secrets for security reasons:
|
|||
cat > garage.toml <<EOF
|
||||
metadata_dir = "/tmp/meta"
|
||||
data_dir = "/tmp/data"
|
||||
db_engine = "lmdb"
|
||||
db_engine = "sqlite"
|
||||
|
||||
replication_mode = "none"
|
||||
|
||||
|
|
|
@ -264,18 +264,31 @@ Performance characteristics of the different DB engines are as follows:
|
|||
- Sled: tends to produce large data files and also has performance issues,
|
||||
especially when the metadata folder is on a traditional HDD and not on SSD.
|
||||
|
||||
- LMDB: the recommended database engine on 64-bit systems, much more
|
||||
space-efficient and slightly faster. Note that the data format of LMDB is not
|
||||
portable between architectures, so for instance the Garage database of an
|
||||
x86-64 node cannot be moved to an ARM64 node. Also note that, while LMDB can
|
||||
technically be used on 32-bit systems, this will limit your node to very
|
||||
small database sizes due to how LMDB works; it is therefore not recommended.
|
||||
- LMDB: the recommended database engine for high-performance distributed
|
||||
clusters, much more space-efficient and significantly faster. LMDB works very
|
||||
well, but is known to have the following limitations:
|
||||
|
||||
- The data format of LMDB is not portable between architectures, so for
|
||||
instance the Garage database of an x86-64 node cannot be moved to an ARM64
|
||||
node.
|
||||
|
||||
- While LMDB can technically be used on 32-bit systems, this will limit your
|
||||
node to very small database sizes due to how LMDB works; it is therefore
|
||||
not recommended.
|
||||
|
||||
- Several users have reported corrupted LMDB database files after an unclean
|
||||
shutdown (e.g. a power outage). This situation can generally be recovered
|
||||
from if your cluster is geo-replicated (by rebuilding your metadata db from
|
||||
other nodes), or if you have saved regular snapshots at the filesystem
|
||||
level.
|
||||
|
||||
- Sqlite: Garage supports Sqlite as an alternative storage backend for
|
||||
metadata, and although it has not been tested as much, it is expected to work
|
||||
satisfactorily. Since Garage v0.9.0, performance issues have largely been
|
||||
fixed by allowing for a no-fsync mode (see `metadata_fsync`). Sqlite does not
|
||||
have the database size limitation of LMDB on 32-bit systems.
|
||||
metadata, which does not have the issues listed above for LMDB.
|
||||
On versions 0.8.x and earlier, Sqlite should be avoided due to abysmal
|
||||
performance, which was fixed with the addition of `metadata_fsync`.
|
||||
Sqlite is still probably slower than LMDB due to the way we use it,
|
||||
so it is not the best choice for high-performance storage clusters,
|
||||
but it should work fine in many cases.
|
||||
|
||||
It is possible to convert Garage's metadata directory from one format to another
|
||||
using the `garage convert-db` command, which should be used as follows:
|
||||
|
@ -302,7 +315,7 @@ Using this option reduces the risk of simultaneous metadata corruption on severa
|
|||
cluster nodes, which could lead to data loss.
|
||||
|
||||
If multi-site replication is used, this option is most likely not necessary, as
|
||||
it is extremely unlikely that two nodes in different locations will have a
|
||||
it is extremely unlikely that two nodes in different locations will have a
|
||||
power failure at the exact same time.
|
||||
|
||||
(Metadata corruption on a single node is not an issue, the corrupted data file
|
||||
|
|
Loading…
Reference in a new issue