adjust docs for replication factor
This commit is contained in:
parent
603604cdfc
commit
8f86af52ed
5 changed files with 83 additions and 62 deletions
|
@ -116,7 +116,7 @@ metadata_dir = "/var/lib/garage/meta"
|
||||||
data_dir = "/var/lib/garage/data"
|
data_dir = "/var/lib/garage/data"
|
||||||
db_engine = "lmdb"
|
db_engine = "lmdb"
|
||||||
|
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
|
|
||||||
compression_level = 2
|
compression_level = 2
|
||||||
|
|
||||||
|
|
|
@ -12,7 +12,7 @@ An introduction to building cluster layouts can be found in the [production depl
|
||||||
In Garage, all of the data that can be stored in a given cluster is divided
|
In Garage, all of the data that can be stored in a given cluster is divided
|
||||||
into slices which we call *partitions*. Each partition is stored by
|
into slices which we call *partitions*. Each partition is stored by
|
||||||
one or several nodes in the cluster
|
one or several nodes in the cluster
|
||||||
(see [`replication_mode`](@/documentation/reference-manual/configuration.md#replication_mode)).
|
(see [`replication_factor`](@/documentation/reference-manual/configuration.md#replication_factor)).
|
||||||
The layout determines the correspondence between these partitions,
|
The layout determines the correspondence between these partitions,
|
||||||
which exist on a logical level, and actual storage nodes.
|
which exist on a logical level, and actual storage nodes.
|
||||||
|
|
||||||
|
|
|
@ -59,7 +59,7 @@ metadata_dir = "/tmp/meta"
|
||||||
data_dir = "/tmp/data"
|
data_dir = "/tmp/data"
|
||||||
db_engine = "lmdb"
|
db_engine = "lmdb"
|
||||||
|
|
||||||
replication_mode = "none"
|
replication_factor = 1
|
||||||
|
|
||||||
rpc_bind_addr = "[::]:3901"
|
rpc_bind_addr = "[::]:3901"
|
||||||
rpc_public_addr = "127.0.0.1:3901"
|
rpc_public_addr = "127.0.0.1:3901"
|
||||||
|
|
|
@ -8,7 +8,8 @@ weight = 20
|
||||||
Here is an example `garage.toml` configuration file that illustrates all of the possible options:
|
Here is an example `garage.toml` configuration file that illustrates all of the possible options:
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
|
consistency_mode = "consistent"
|
||||||
|
|
||||||
metadata_dir = "/var/lib/garage/meta"
|
metadata_dir = "/var/lib/garage/meta"
|
||||||
data_dir = "/var/lib/garage/data"
|
data_dir = "/var/lib/garage/data"
|
||||||
|
@ -90,7 +91,8 @@ Top-level configuration options:
|
||||||
[`lmdb_map_size`](#lmdb_map_size),
|
[`lmdb_map_size`](#lmdb_map_size),
|
||||||
[`metadata_dir`](#metadata_dir),
|
[`metadata_dir`](#metadata_dir),
|
||||||
[`metadata_fsync`](#metadata_fsync),
|
[`metadata_fsync`](#metadata_fsync),
|
||||||
[`replication_mode`](#replication_mode),
|
[`replication_factor`](#replication_factor),
|
||||||
|
[`consistency_mode`](#consistency_mode),
|
||||||
[`rpc_bind_addr`](#rpc_bind_addr),
|
[`rpc_bind_addr`](#rpc_bind_addr),
|
||||||
[`rpc_bind_outgoing`](#rpc_bind_outgoing),
|
[`rpc_bind_outgoing`](#rpc_bind_outgoing),
|
||||||
[`rpc_public_addr`](#rpc_public_addr),
|
[`rpc_public_addr`](#rpc_public_addr),
|
||||||
|
@ -133,11 +135,12 @@ The `[admin]` section:
|
||||||
|
|
||||||
### Top-level configuration options
|
### Top-level configuration options
|
||||||
|
|
||||||
#### `replication_mode` {#replication_mode}
|
#### `replication_factor` {#replication_factor}
|
||||||
|
|
||||||
Garage supports the following replication modes:
|
The replication factor can be any positive integer smaller or equal the node count in your cluster.
|
||||||
|
The chosen replication factor has a big impact on the cluster's failure tolerancy and performance characteristics.
|
||||||
|
|
||||||
- `none` or `1`: data stored on Garage is stored on a single node. There is no
|
- `1`: data stored on Garage is stored on a single node. There is no
|
||||||
redundancy, and data will be unavailable as soon as one node fails or its
|
redundancy, and data will be unavailable as soon as one node fails or its
|
||||||
network is disconnected. Do not use this for anything else than test
|
network is disconnected. Do not use this for anything else than test
|
||||||
deployments.
|
deployments.
|
||||||
|
@ -148,17 +151,6 @@ Garage supports the following replication modes:
|
||||||
before losing data. Data remains available in read-only mode when one node is
|
before losing data. Data remains available in read-only mode when one node is
|
||||||
down, but write operations will fail.
|
down, but write operations will fail.
|
||||||
|
|
||||||
- `2-dangerous`: a variant of mode `2`, where written objects are written to
|
|
||||||
the second replica asynchronously. This means that Garage will return `200
|
|
||||||
OK` to a PutObject request before the second copy is fully written (or even
|
|
||||||
before it even starts being written). This means that data can more easily
|
|
||||||
be lost if the node crashes before a second copy can be completed. This
|
|
||||||
also means that written objects might not be visible immediately in read
|
|
||||||
operations. In other words, this mode severely breaks the consistency and
|
|
||||||
durability guarantees of standard Garage cluster operation. Benefits of
|
|
||||||
this mode: you can still write to your cluster when one node is
|
|
||||||
unavailable.
|
|
||||||
|
|
||||||
- `3`: data stored on Garage will be stored on three different nodes, if
|
- `3`: data stored on Garage will be stored on three different nodes, if
|
||||||
possible each in a different zones. Garage tolerates two node failure, or
|
possible each in a different zones. Garage tolerates two node failure, or
|
||||||
several node failures but in no more than two zones (in a deployment with at
|
several node failures but in no more than two zones (in a deployment with at
|
||||||
|
@ -166,55 +158,84 @@ Garage supports the following replication modes:
|
||||||
or node failures are only in a single zone, reading and writing data to
|
or node failures are only in a single zone, reading and writing data to
|
||||||
Garage can continue normally.
|
Garage can continue normally.
|
||||||
|
|
||||||
- `3-degraded`: a variant of replication mode `3`, that lowers the read
|
- `5`, `7`, ...: When setting the replication factor above 3, it is most useful to
|
||||||
quorum to `1`, to allow you to read data from your cluster when several
|
choose an uneven value, since for every two copies added, one more node can fail
|
||||||
nodes (or nodes in several zones) are unavailable. In this mode, Garage
|
before losing the ability to write and read to the cluster.
|
||||||
does not provide read-after-write consistency anymore. The write quorum is
|
|
||||||
still 2, ensuring that data successfully written to Garage is stored on at
|
|
||||||
least two nodes.
|
|
||||||
|
|
||||||
- `3-dangerous`: a variant of replication mode `3` that lowers both the read
|
|
||||||
and write quorums to `1`, to allow you to both read and write to your
|
|
||||||
cluster when several nodes (or nodes in several zones) are unavailable. It
|
|
||||||
is the least consistent mode of operation proposed by Garage, and also one
|
|
||||||
that should probably never be used.
|
|
||||||
|
|
||||||
Note that in modes `2` and `3`,
|
Note that in modes `2` and `3`,
|
||||||
if at least the same number of zones are available, an arbitrary number of failures in
|
if at least the same number of zones are available, an arbitrary number of failures in
|
||||||
any given zone is tolerated as copies of data will be spread over several zones.
|
any given zone is tolerated as copies of data will be spread over several zones.
|
||||||
|
|
||||||
**Make sure `replication_mode` is the same in the configuration files of all nodes.
|
**Make sure `replication_factor` is the same in the configuration files of all nodes.
|
||||||
Never run a Garage cluster where that is not the case.**
|
Never run a Garage cluster where that is not the case.**
|
||||||
|
|
||||||
|
It is technically possible to change the replication factor although it's a
|
||||||
|
dangerous operation that is not officially supported. This requires you to
|
||||||
|
delete the existing cluster layout and create a new layout from scratch,
|
||||||
|
meaning that a full rebalancing of your cluster's data will be needed. To do
|
||||||
|
it, shut down your cluster entirely, delete the `custer_layout` files in the
|
||||||
|
meta directories of all your nodes, update all your configuration files with
|
||||||
|
the new `replication_factor` parameter, restart your cluster, and then create a
|
||||||
|
new layout with all the nodes you want to keep. Rebalancing data will take
|
||||||
|
some time, and data might temporarily appear unavailable to your users.
|
||||||
|
It is recommended to shut down public access to the cluster while rebalancing
|
||||||
|
is in progress. In theory, no data should be lost as rebalancing is a
|
||||||
|
routine operation for Garage, although we cannot guarantee you that everything
|
||||||
|
will go right in such an extreme scenario.
|
||||||
|
|
||||||
|
#### `consistency_mode` {#consistency_mode}
|
||||||
|
|
||||||
|
The consistency mode setting determines the read and write behaviour of your cluster.
|
||||||
|
|
||||||
|
- `consistent`: The default setting. This is what the paragraph above describes.
|
||||||
|
The read and write quorum will be determined so that read-after-write consistency
|
||||||
|
is guaranteed.
|
||||||
|
- `degraded`: Lowers the read
|
||||||
|
quorum to `1`, to allow you to read data from your cluster when several
|
||||||
|
nodes (or nodes in several zones) are unavailable. In this mode, Garage
|
||||||
|
does not provide read-after-write consistency anymore.
|
||||||
|
The write quorum stays the same as in the `consistent` mode, ensuring that
|
||||||
|
data successfully written to Garage is stored on multiple nodes (depending
|
||||||
|
the replication factor).
|
||||||
|
- `dangerous`: This mode lowers both the read
|
||||||
|
and write quorums to `1`, to allow you to both read and write to your
|
||||||
|
cluster when several nodes (or nodes in several zones) are unavailable. It
|
||||||
|
is the least consistent mode of operation proposed by Garage, and also one
|
||||||
|
that should probably never be used.
|
||||||
|
|
||||||
|
Changing the `consistency_mode` between modes while leaving the `replication_factor` untouched
|
||||||
|
(e.g. setting your node's `consistency_mode` to `degraded` when it was previously unset, or from
|
||||||
|
`dangerous` to `consistent`), can be done easily by just changing the `consistency_mode`
|
||||||
|
parameter in your config files and restarting all your Garage nodes.
|
||||||
|
|
||||||
|
The consistency mode can be used together with various replication factors, to achieve
|
||||||
|
a wide range of read and write characteristics. Some examples:
|
||||||
|
|
||||||
|
- Replication factor `2`, consistency mode `degraded`: While this mode
|
||||||
|
technically exists, its properties are the same as with consistency mode `consistent`,
|
||||||
|
since the read quorum with replication factor `2`, consistency mode `consistent` is already 1.
|
||||||
|
|
||||||
|
- Replication factor `2`, consistency mode `dangerous`: written objects are written to
|
||||||
|
the second replica asynchronously. This means that Garage will return `200
|
||||||
|
OK` to a PutObject request before the second copy is fully written (or even
|
||||||
|
before it even starts being written). This means that data can more easily
|
||||||
|
be lost if the node crashes before a second copy can be completed. This
|
||||||
|
also means that written objects might not be visible immediately in read
|
||||||
|
operations. In other words, this configuration severely breaks the consistency and
|
||||||
|
durability guarantees of standard Garage cluster operation. Benefits of
|
||||||
|
this configuration: you can still write to your cluster when one node is
|
||||||
|
unavailable.
|
||||||
|
|
||||||
The quorums associated with each replication mode are described below:
|
The quorums associated with each replication mode are described below:
|
||||||
|
|
||||||
| `replication_mode` | Number of replicas | Write quorum | Read quorum | Read-after-write consistency? |
|
| `consistency_mode` | `replication_factor` | Write quorum | Read quorum | Read-after-write consistency? |
|
||||||
| ------------------ | ------------------ | ------------ | ----------- | ----------------------------- |
|
| ------------------ | -------------------- | ------------ | ----------- | ----------------------------- |
|
||||||
| `none` or `1` | 1 | 1 | 1 | yes |
|
| `consistent` | 1 | 1 | 1 | yes |
|
||||||
| `2` | 2 | 2 | 1 | yes |
|
| `consistent` | 2 | 2 | 1 | yes |
|
||||||
| `2-dangerous` | 2 | 1 | 1 | NO |
|
| `dangerous` | 2 | 1 | 1 | NO |
|
||||||
| `3` | 3 | 2 | 2 | yes |
|
| `consistent` | 3 | 2 | 2 | yes |
|
||||||
| `3-degraded` | 3 | 2 | 1 | NO |
|
| `degraded` | 3 | 2 | 1 | NO |
|
||||||
| `3-dangerous` | 3 | 1 | 1 | NO |
|
| `dangerous` | 3 | 1 | 1 | NO |
|
||||||
|
|
||||||
Changing the `replication_mode` between modes with the same number of replicas
|
|
||||||
(e.g. from `3` to `3-degraded`, or from `2-dangerous` to `2`), can be done easily by
|
|
||||||
just changing the `replication_mode` parameter in your config files and restarting all your
|
|
||||||
Garage nodes.
|
|
||||||
|
|
||||||
It is also technically possible to change the replication mode to a mode with a
|
|
||||||
different numbers of replicas, although it's a dangerous operation that is not
|
|
||||||
officially supported. This requires you to delete the existing cluster layout
|
|
||||||
and create a new layout from scratch, meaning that a full rebalancing of your
|
|
||||||
cluster's data will be needed. To do it, shut down your cluster entirely,
|
|
||||||
delete the `custer_layout` files in the meta directories of all your nodes,
|
|
||||||
update all your configuration files with the new `replication_mode` parameter,
|
|
||||||
restart your cluster, and then create a new layout with all the nodes you want
|
|
||||||
to keep. Rebalancing data will take some time, and data might temporarily
|
|
||||||
appear unavailable to your users. It is recommended to shut down public access
|
|
||||||
to the cluster while rebalancing is in progress. In theory, no data should be
|
|
||||||
lost as rebalancing is a routine operation for Garage, although we cannot
|
|
||||||
guarantee you that everything will go right in such an extreme scenario.
|
|
||||||
|
|
||||||
#### `metadata_dir` {#metadata_dir}
|
#### `metadata_dir` {#metadata_dir}
|
||||||
|
|
||||||
|
|
|
@ -39,10 +39,10 @@ Read about cluster layout management [here](@/documentation/operations/layout.md
|
||||||
|
|
||||||
### Several replication modes
|
### Several replication modes
|
||||||
|
|
||||||
Garage supports a variety of replication modes, with 1 copy, 2 copies or 3 copies of your data,
|
Garage supports a variety of replication modes, with configurable replica count,
|
||||||
and with various levels of consistency, in order to adapt to a variety of usage scenarios.
|
and with various levels of consistency, in order to adapt to a variety of usage scenarios.
|
||||||
Read our reference page on [supported replication modes](@/documentation/reference-manual/configuration.md#replication_mode)
|
Read our reference page on [supported replication modes](@/documentation/reference-manual/configuration.md#replication_factor)
|
||||||
to select the replication mode best suited to your use case (hint: in most cases, `replication_mode = "3"` is what you want).
|
to select the replication mode best suited to your use case (hint: in most cases, `replication_factor = 3` is what you want).
|
||||||
|
|
||||||
### Compression and deduplication
|
### Compression and deduplication
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue