multi-hdd support (fix #218) #625

Merged
lx merged 26 commits from multihdd into next 2023-09-11 10:52:02 +00:00
3 changed files with 116 additions and 8 deletions
Showing only changes of commit 6595efd82f - Show all commits

View file

@ -75,16 +75,11 @@ to store 2 TB of data in total.
- For the metadata storage, Garage does not do checksumming and integrity - For the metadata storage, Garage does not do checksumming and integrity
verification on its own. If you are afraid of bitrot/data corruption, verification on its own. If you are afraid of bitrot/data corruption,
put your metadata directory on a BTRFS partition. Otherwise, just use regular put your metadata directory on a ZFS or BTRFS partition. Otherwise, just use regular
EXT4 or XFS. EXT4 or XFS.
- Having a single server with several storage drives is currently not very well - Servers with multiple HDDs are supported natively by Garage without resorting
supported in Garage ([#218](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/218)). to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
For an easy setup, just put all your drives in a RAID0 or a ZFS RAIDZ array.
If you're adventurous, you can try to format each of your disk as
a separate XFS partition, and then run one `garage` daemon per disk drive,
or use something like [`mergerfs`](https://github.com/trapexit/mergerfs) to merge
all your disks in a single union filesystem that spreads load over them.
## Get a Docker image ## Get a Docker image

View file

@ -0,0 +1,100 @@
+++
title = "Multi-HDD support"
weight = 15
+++
Since v0.9, Garage natively supports nodes that have several storage drives
for storing data blocks (not for metadata storage).
## Initial setup
To set up a new Garage storage node with multiple HDDs,
format and mount all your drives in different directories,
and use a Garage configuration as follows:
```toml
data_dir = [
{ path = "/path/to/hdd1", capacity = "2T" },
{ path = "/path/to/hdd2", capacity = "4T" },
]
```
Garage will automatically balance all blocks stored by the node
among the different specified directories, proportionnally to the
specified capacities.
## Updating the list of storage locations
If you add new storage locations to your `data_dir`,
Garage will not rebalance existing data between storage locations.
Newly written blocks will be balanced proportionnally to the specified capacities,
and existing data may be moved between drives to improve balancing,
but only opportunistically when a data block is re-written (e.g. an object
is re-uploaded, or an object with a duplicate block is uploaded).
To understand precisely what is happening, we need to dive in to how Garage
splits data among the different storage locations.
First of all, Garage divides the set of all possible block hashes
in a fixed number of slices (currently 1024), and assigns
to each slice a primary storage location among the specified data directories.
The number of slices having their primary location in each data directory
is proportionnal to the capacity specified in the config file.
When Garage receives a block to write, it will always write it in the primary
directory of the slice that contains its hash.
Now, to be able to not lose existing data blocks when storage locations
are added, Garage also keeps a list of secondary data directories
for all of the hash slices. Secondary data directories for a slice indicates
storage locations that once were primary directories for that slice, i.e. where
Garage knows that data blocks of that slice might be stored.
When Garage is requested to read a certain data block,
it will first look in the primary storage directory of its slice,
and if it doesn't find it there it goes through all of the secondary storage
locations until it finds it. This allows Garage to continue operating
normally when storage locations are added, without having to shuffle
files between drives to place them in the correct location.
This relatively simple strategy works well but does not ensure that data
is correctly balanced among drives according to their capacity.
To rebalance data, two strategies can be used:
- Lazy rebalancing: when a block is re-written (e.g. the object is re-uploaded),
Garage checks whether the existing copy is in the primary directory of the slice
or in a secondary directory. If the current copy is in a secondary directory,
Garage re-writes a copy in the primary directory and deletes the one from the
secondary directory.
- Active rebalancing: an operator of a Garage node can explicitly launch a repair
procedure that rebalances the data directories, moving all blocks to their
primary location. Once done, all secondary locations for all hash slices are
removed so that they won't be checked anymore when looking for a data block.
## Read-only storage locations
If you would like to move all data blocks from an existing data directory to one
or several new data directories, mark the old directory as read-only:
```toml
data_dir = [
{ path = "/path/to/old_data", read_only = true },
{ path = "/path/to/new_hdd1", capacity = "2T" },
{ path = "/path/to/new_hdd2", capacity = "4T" },
]
```
Garage will be able to read requested blocks from the read-only directory.
Garage will also move data out of the read-only directory either progressively
(lazy rebalancing) or if requested explicitly (active rebalancing).
Once an active rebalancing has finished, your read-only directory should be empty:
it might still contain subdirectories, but no data files. You can check that
it contains no files using:
```bash
find -type f /path/to/old_data
```
at which point it can be removed from the `data_dir` list in your config file.

View file

@ -91,6 +91,19 @@ This folder can be placed on an HDD. The space available for `data_dir`
should be counted to determine a node's capacity should be counted to determine a node's capacity
when [adding it to the cluster layout](@/documentation/cookbook/real-world.md). when [adding it to the cluster layout](@/documentation/cookbook/real-world.md).
Since `v0.9.0`, Garage supports multiple data directories with the following syntax:
```toml
data_dir = [
{ path = "/path/to/old_data", read_only = true },
{ path = "/path/to/new_hdd1", capacity = "2T" },
{ path = "/path/to/new_hdd2", capacity = "4T" },
]
```
See [the dedicated documentation page](@/documentation/operations/multi-hdd.md)
on how to operate Garage in such a setup.
### `db_engine` (since `v0.8.0`) ### `db_engine` (since `v0.8.0`)
By default, Garage uses the Sled embedded database library By default, Garage uses the Sled embedded database library