Doc: be slightly more critical of LMDB #773

Merged
lx merged 2 commits from doc-updates into main 2024-03-14 15:43:30 +00:00
4 changed files with 55 additions and 33 deletions

View file

@ -91,5 +91,5 @@ The following feature flags are available in v0.8.0:
| `metrics` | *by default* | Enable collection of metrics in Prometheus format on the admin API | | `metrics` | *by default* | Enable collection of metrics in Prometheus format on the admin API |
| `telemetry-otlp` | optional | Enable collection of execution traces using OpenTelemetry | | `telemetry-otlp` | optional | Enable collection of execution traces using OpenTelemetry |
| `sled` | *by default* | Enable using Sled to store Garage's metadata | | `sled` | *by default* | Enable using Sled to store Garage's metadata |
| `lmdb` | optional | Enable using LMDB to store Garage's metadata | | `lmdb` | *by default* | Enable using LMDB to store Garage's metadata |
| `sqlite` | optional | Enable using Sqlite3 to store Garage's metadata | | `sqlite` | *by default* | Enable using Sqlite3 to store Garage's metadata |

View file

@ -53,9 +53,9 @@ to store 2 TB of data in total.
### Best practices ### Best practices
- If you have fast dedicated networking between all your nodes, and are planing to store - If you have reasonably fast networking between all your nodes, and are planing to store
very large files, bump the `block_size` configuration parameter to 10 MB mostly large files, bump the `block_size` configuration parameter to 10 MB
(`block_size = 10485760`). (`block_size = "10M"`).
- Garage stores its files in two locations: it uses a metadata directory to store frequently-accessed - Garage stores its files in two locations: it uses a metadata directory to store frequently-accessed
small metadata items, and a data directory to store data blocks of uploaded objects. small metadata items, and a data directory to store data blocks of uploaded objects.
@ -68,20 +68,29 @@ to store 2 TB of data in total.
EXT4 is not recommended as it has more strict limitations on the number of inodes, EXT4 is not recommended as it has more strict limitations on the number of inodes,
which might cause issues with Garage when large numbers of objects are stored. which might cause issues with Garage when large numbers of objects are stored.
- If you only have an HDD and no SSD, it's fine to put your metadata alongside the data
on the same drive. Having lots of RAM for your kernel to cache the metadata will
help a lot with performance. Make sure to use the LMDB database engine,
instead of Sled, which suffers from quite bad performance degradation on HDDs.
Sled is still the default for legacy reasons, but is not recommended anymore.
- For the metadata storage, Garage does not do checksumming and integrity
verification on its own. If you are afraid of bitrot/data corruption,
put your metadata directory on a ZFS or BTRFS partition. Otherwise, just use regular
EXT4 or XFS.
- Servers with multiple HDDs are supported natively by Garage without resorting - Servers with multiple HDDs are supported natively by Garage without resorting
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md). to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
- For the metadata storage, Garage does not do checksumming and integrity
verification on its own. Users have reported that when using the LMDB
database engine (the default), database files have a tendency of becoming
corrupted after an unclean shutdown (e.g. a power outage), so you should use
a robust filesystem such as BTRFS or ZFS for the metadata partition, and take
regular snapshots so that you can restore to a recent known-good state in
case of an incident. If you cannot do so, you might want to switch to Sqlite
which is more robust.
- LMDB is the fastest and most tested database engine, but it has the following
weaknesses: 1/ data files are not architecture-independent, you cannot simply
move a Garage metadata directory between nodes running different architectures,
and 2/ LMDB is not suited for 32-bit platforms. Sqlite is a viable alternative
if any of these are of concern.
- If you only have an HDD and no SSD, it's fine to put your metadata alongside
the data on the same drive, but then consider your filesystem choice wisely
(see above). Having lots of RAM for your kernel to cache the metadata will
help a lot with performance.
## Get a Docker image ## Get a Docker image
Our docker image is currently named `dxflrs/garage` and is stored on the [Docker Hub](https://hub.docker.com/r/dxflrs/garage/tags?page=1&ordering=last_updated). Our docker image is currently named `dxflrs/garage` and is stored on the [Docker Hub](https://hub.docker.com/r/dxflrs/garage/tags?page=1&ordering=last_updated).
@ -187,7 +196,7 @@ upgrades. With the containerized setup proposed here, the upgrade process
will require stopping and removing the existing container, and re-creating it will require stopping and removing the existing container, and re-creating it
with the upgraded version. with the upgraded version.
## Controling the daemon ## Controlling the daemon
The `garage` binary has two purposes: The `garage` binary has two purposes:
- it acts as a daemon when launched with `garage server` - it acts as a daemon when launched with `garage server`
@ -245,7 +254,7 @@ You can then instruct nodes to connect to one another as follows:
Venus$ garage node connect 563e1ac825ee3323aa441e72c26d1030d6d4414aeb3dd25287c531e7fc2bc95d@[fc00:1::1]:3901 Venus$ garage node connect 563e1ac825ee3323aa441e72c26d1030d6d4414aeb3dd25287c531e7fc2bc95d@[fc00:1::1]:3901
``` ```
You don't nead to instruct all node to connect to all other nodes: You don't need to instruct all node to connect to all other nodes:
nodes will discover one another transitively. nodes will discover one another transitively.
Now if your run `garage status` on any node, you should have an output that looks as follows: Now if your run `garage status` on any node, you should have an output that looks as follows:

View file

@ -57,7 +57,7 @@ to generate unique and private secrets for security reasons:
cat > garage.toml <<EOF cat > garage.toml <<EOF
metadata_dir = "/tmp/meta" metadata_dir = "/tmp/meta"
data_dir = "/tmp/data" data_dir = "/tmp/data"
db_engine = "lmdb" db_engine = "sqlite"
replication_mode = "none" replication_mode = "none"

View file

@ -264,18 +264,31 @@ Performance characteristics of the different DB engines are as follows:
- Sled: tends to produce large data files and also has performance issues, - Sled: tends to produce large data files and also has performance issues,
especially when the metadata folder is on a traditional HDD and not on SSD. especially when the metadata folder is on a traditional HDD and not on SSD.
- LMDB: the recommended database engine on 64-bit systems, much more - LMDB: the recommended database engine for high-performance distributed
space-efficient and slightly faster. Note that the data format of LMDB is not clusters, much more space-efficient and significantly faster. LMDB works very
portable between architectures, so for instance the Garage database of an well, but is known to have the following limitations:
x86-64 node cannot be moved to an ARM64 node. Also note that, while LMDB can
technically be used on 32-bit systems, this will limit your node to very - The data format of LMDB is not portable between architectures, so for
small database sizes due to how LMDB works; it is therefore not recommended. instance the Garage database of an x86-64 node cannot be moved to an ARM64
node.
- While LMDB can technically be used on 32-bit systems, this will limit your
node to very small database sizes due to how LMDB works; it is therefore
not recommended.
- Several users have reported corrupted LMDB database files after an unclean
shutdown (e.g. a power outage). This situation can generally be recovered
from if your cluster is geo-replicated (by rebuilding your metadata db from
other nodes), or if you have saved regular snapshots at the filesystem
level.
- Sqlite: Garage supports Sqlite as an alternative storage backend for - Sqlite: Garage supports Sqlite as an alternative storage backend for
metadata, and although it has not been tested as much, it is expected to work metadata, which does not have the issues listed above for LMDB.
satisfactorily. Since Garage v0.9.0, performance issues have largely been On versions 0.8.x and earlier, Sqlite should be avoided due to abysmal
fixed by allowing for a no-fsync mode (see `metadata_fsync`). Sqlite does not performance, which was fixed with the addition of `metadata_fsync`.
have the database size limitation of LMDB on 32-bit systems. Sqlite is still probably slower than LMDB due to the way we use it,
so it is not the best choice for high-performance storage clusters,
but it should work fine in many cases.
It is possible to convert Garage's metadata directory from one format to another It is possible to convert Garage's metadata directory from one format to another
using the `garage convert-db` command, which should be used as follows: using the `garage convert-db` command, which should be used as follows: