ReplicationMode -> ConsistencyMode+ReplicationFactor #750
|
@ -116,7 +116,7 @@ metadata_dir = "/var/lib/garage/meta"
|
||||||
data_dir = "/var/lib/garage/data"
|
data_dir = "/var/lib/garage/data"
|
||||||
db_engine = "lmdb"
|
db_engine = "lmdb"
|
||||||
|
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
|
|
||||||
compression_level = 2
|
compression_level = 2
|
||||||
|
|
||||||
|
|
|
@ -12,7 +12,7 @@ An introduction to building cluster layouts can be found in the [production depl
|
||||||
In Garage, all of the data that can be stored in a given cluster is divided
|
In Garage, all of the data that can be stored in a given cluster is divided
|
||||||
into slices which we call *partitions*. Each partition is stored by
|
into slices which we call *partitions*. Each partition is stored by
|
||||||
one or several nodes in the cluster
|
one or several nodes in the cluster
|
||||||
(see [`replication_mode`](@/documentation/reference-manual/configuration.md#replication_mode)).
|
(see [`replication_factor`](@/documentation/reference-manual/configuration.md#replication_factor)).
|
||||||
The layout determines the correspondence between these partitions,
|
The layout determines the correspondence between these partitions,
|
||||||
which exist on a logical level, and actual storage nodes.
|
which exist on a logical level, and actual storage nodes.
|
||||||
|
|
||||||
|
|
|
@ -59,7 +59,7 @@ metadata_dir = "/tmp/meta"
|
||||||
data_dir = "/tmp/data"
|
data_dir = "/tmp/data"
|
||||||
db_engine = "lmdb"
|
db_engine = "lmdb"
|
||||||
|
|
||||||
replication_mode = "none"
|
replication_factor = 1
|
||||||
|
|
||||||
rpc_bind_addr = "[::]:3901"
|
rpc_bind_addr = "[::]:3901"
|
||||||
rpc_public_addr = "127.0.0.1:3901"
|
rpc_public_addr = "127.0.0.1:3901"
|
||||||
|
|
|
@ -8,7 +8,8 @@ weight = 20
|
||||||
Here is an example `garage.toml` configuration file that illustrates all of the possible options:
|
Here is an example `garage.toml` configuration file that illustrates all of the possible options:
|
||||||
|
|
||||||
```toml
|
```toml
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
|
|||||||
|
consistency_mode = "consistent"
|
||||||
|
|
||||||
metadata_dir = "/var/lib/garage/meta"
|
metadata_dir = "/var/lib/garage/meta"
|
||||||
data_dir = "/var/lib/garage/data"
|
data_dir = "/var/lib/garage/data"
|
||||||
|
@ -90,7 +91,8 @@ Top-level configuration options:
|
||||||
[`lmdb_map_size`](#lmdb_map_size),
|
[`lmdb_map_size`](#lmdb_map_size),
|
||||||
[`metadata_dir`](#metadata_dir),
|
[`metadata_dir`](#metadata_dir),
|
||||||
[`metadata_fsync`](#metadata_fsync),
|
[`metadata_fsync`](#metadata_fsync),
|
||||||
[`replication_mode`](#replication_mode),
|
[`replication_factor`](#replication_factor),
|
||||||
|
[`consistency_mode`](#consistency_mode),
|
||||||
[`rpc_bind_addr`](#rpc_bind_addr),
|
[`rpc_bind_addr`](#rpc_bind_addr),
|
||||||
[`rpc_bind_outgoing`](#rpc_bind_outgoing),
|
[`rpc_bind_outgoing`](#rpc_bind_outgoing),
|
||||||
[`rpc_public_addr`](#rpc_public_addr),
|
[`rpc_public_addr`](#rpc_public_addr),
|
||||||
|
@ -133,11 +135,12 @@ The `[admin]` section:
|
||||||
|
|
||||||
### Top-level configuration options
|
### Top-level configuration options
|
||||||
|
|
||||||
#### `replication_mode` {#replication_mode}
|
#### `replication_factor` {#replication_factor}
|
||||||
|
|
||||||
Garage supports the following replication modes:
|
The replication factor can be any positive integer smaller or equal the node count in your cluster.
|
||||||
|
The chosen replication factor has a big impact on the cluster's failure tolerancy and performance characteristics.
|
||||||
|
|
||||||
- `none` or `1`: data stored on Garage is stored on a single node. There is no
|
- `1`: data stored on Garage is stored on a single node. There is no
|
||||||
redundancy, and data will be unavailable as soon as one node fails or its
|
redundancy, and data will be unavailable as soon as one node fails or its
|
||||||
network is disconnected. Do not use this for anything else than test
|
network is disconnected. Do not use this for anything else than test
|
||||||
deployments.
|
deployments.
|
||||||
|
@ -148,17 +151,6 @@ Garage supports the following replication modes:
|
||||||
before losing data. Data remains available in read-only mode when one node is
|
before losing data. Data remains available in read-only mode when one node is
|
||||||
down, but write operations will fail.
|
down, but write operations will fail.
|
||||||
|
|
||||||
- `2-dangerous`: a variant of mode `2`, where written objects are written to
|
|
||||||
the second replica asynchronously. This means that Garage will return `200
|
|
||||||
OK` to a PutObject request before the second copy is fully written (or even
|
|
||||||
before it even starts being written). This means that data can more easily
|
|
||||||
be lost if the node crashes before a second copy can be completed. This
|
|
||||||
also means that written objects might not be visible immediately in read
|
|
||||||
operations. In other words, this mode severely breaks the consistency and
|
|
||||||
durability guarantees of standard Garage cluster operation. Benefits of
|
|
||||||
this mode: you can still write to your cluster when one node is
|
|
||||||
unavailable.
|
|
||||||
|
|
||||||
- `3`: data stored on Garage will be stored on three different nodes, if
|
- `3`: data stored on Garage will be stored on three different nodes, if
|
||||||
possible each in a different zones. Garage tolerates two node failure, or
|
possible each in a different zones. Garage tolerates two node failure, or
|
||||||
lx marked this conversation as resolved
lx
commented
Add a comment saying that arbitrarily large replication factors are now supported? Add a comment saying that arbitrarily large replication factors are now supported?
yuka
commented
Added a note about replication factors higher than 3 in the listing Added a note about replication factors higher than 3 in the listing
|
|||||||
several node failures but in no more than two zones (in a deployment with at
|
several node failures but in no more than two zones (in a deployment with at
|
||||||
|
@ -166,55 +158,84 @@ Garage supports the following replication modes:
|
||||||
or node failures are only in a single zone, reading and writing data to
|
or node failures are only in a single zone, reading and writing data to
|
||||||
Garage can continue normally.
|
Garage can continue normally.
|
||||||
|
|
||||||
- `3-degraded`: a variant of replication mode `3`, that lowers the read
|
- `5`, `7`, ...: When setting the replication factor above 3, it is most useful to
|
||||||
quorum to `1`, to allow you to read data from your cluster when several
|
choose an uneven value, since for every two copies added, one more node can fail
|
||||||
nodes (or nodes in several zones) are unavailable. In this mode, Garage
|
before losing the ability to write and read to the cluster.
|
||||||
does not provide read-after-write consistency anymore. The write quorum is
|
|
||||||
still 2, ensuring that data successfully written to Garage is stored on at
|
|
||||||
least two nodes.
|
|
||||||
|
|
||||||
- `3-dangerous`: a variant of replication mode `3` that lowers both the read
|
|
||||||
and write quorums to `1`, to allow you to both read and write to your
|
|
||||||
cluster when several nodes (or nodes in several zones) are unavailable. It
|
|
||||||
is the least consistent mode of operation proposed by Garage, and also one
|
|
||||||
that should probably never be used.
|
|
||||||
|
|
||||||
Note that in modes `2` and `3`,
|
Note that in modes `2` and `3`,
|
||||||
if at least the same number of zones are available, an arbitrary number of failures in
|
if at least the same number of zones are available, an arbitrary number of failures in
|
||||||
any given zone is tolerated as copies of data will be spread over several zones.
|
any given zone is tolerated as copies of data will be spread over several zones.
|
||||||
|
|
||||||
**Make sure `replication_mode` is the same in the configuration files of all nodes.
|
**Make sure `replication_factor` is the same in the configuration files of all nodes.
|
||||||
Never run a Garage cluster where that is not the case.**
|
Never run a Garage cluster where that is not the case.**
|
||||||
|
|
||||||
|
It is technically possible to change the replication factor although it's a
|
||||||
|
dangerous operation that is not officially supported. This requires you to
|
||||||
|
delete the existing cluster layout and create a new layout from scratch,
|
||||||
|
meaning that a full rebalancing of your cluster's data will be needed. To do
|
||||||
|
it, shut down your cluster entirely, delete the `custer_layout` files in the
|
||||||
|
meta directories of all your nodes, update all your configuration files with
|
||||||
|
the new `replication_factor` parameter, restart your cluster, and then create a
|
||||||
|
new layout with all the nodes you want to keep. Rebalancing data will take
|
||||||
|
some time, and data might temporarily appear unavailable to your users.
|
||||||
|
It is recommended to shut down public access to the cluster while rebalancing
|
||||||
|
is in progress. In theory, no data should be lost as rebalancing is a
|
||||||
|
routine operation for Garage, although we cannot guarantee you that everything
|
||||||
|
will go right in such an extreme scenario.
|
||||||
|
|
||||||
|
#### `consistency_mode` {#consistency_mode}
|
||||||
|
|
||||||
|
The consistency mode setting determines the read and write behaviour of your cluster.
|
||||||
|
|
||||||
|
- `consistent`: The default setting. This is what the paragraph above describes.
|
||||||
|
The read and write quorum will be determined so that read-after-write consistency
|
||||||
|
is guaranteed.
|
||||||
|
- `degraded`: Lowers the read
|
||||||
|
quorum to `1`, to allow you to read data from your cluster when several
|
||||||
|
nodes (or nodes in several zones) are unavailable. In this mode, Garage
|
||||||
|
does not provide read-after-write consistency anymore.
|
||||||
|
The write quorum stays the same as in the `consistent` mode, ensuring that
|
||||||
|
data successfully written to Garage is stored on multiple nodes (depending
|
||||||
|
the replication factor).
|
||||||
|
- `dangerous`: This mode lowers both the read
|
||||||
|
and write quorums to `1`, to allow you to both read and write to your
|
||||||
|
cluster when several nodes (or nodes in several zones) are unavailable. It
|
||||||
|
is the least consistent mode of operation proposed by Garage, and also one
|
||||||
|
that should probably never be used.
|
||||||
|
|
||||||
|
Changing the `consistency_mode` between modes while leaving the `replication_factor` untouched
|
||||||
|
(e.g. setting your node's `consistency_mode` to `degraded` when it was previously unset, or from
|
||||||
|
`dangerous` to `consistent`), can be done easily by just changing the `consistency_mode`
|
||||||
|
parameter in your config files and restarting all your Garage nodes.
|
||||||
|
|
||||||
|
The consistency mode can be used together with various replication factors, to achieve
|
||||||
|
a wide range of read and write characteristics. Some examples:
|
||||||
|
|
||||||
|
- Replication factor `2`, consistency mode `degraded`: While this mode
|
||||||
|
technically exists, its properties are the same as with consistency mode `consistent`,
|
||||||
|
since the read quorum with replication factor `2`, consistency mode `consistent` is already 1.
|
||||||
|
|
||||||
|
- Replication factor `2`, consistency mode `dangerous`: written objects are written to
|
||||||
|
the second replica asynchronously. This means that Garage will return `200
|
||||||
|
OK` to a PutObject request before the second copy is fully written (or even
|
||||||
|
before it even starts being written). This means that data can more easily
|
||||||
|
be lost if the node crashes before a second copy can be completed. This
|
||||||
|
also means that written objects might not be visible immediately in read
|
||||||
|
operations. In other words, this configuration severely breaks the consistency and
|
||||||
|
durability guarantees of standard Garage cluster operation. Benefits of
|
||||||
|
this configuration: you can still write to your cluster when one node is
|
||||||
|
unavailable.
|
||||||
|
|
||||||
The quorums associated with each replication mode are described below:
|
The quorums associated with each replication mode are described below:
|
||||||
|
|
||||||
| `replication_mode` | Number of replicas | Write quorum | Read quorum | Read-after-write consistency? |
|
| `consistency_mode` | `replication_factor` | Write quorum | Read quorum | Read-after-write consistency? |
|
||||||
| ------------------ | ------------------ | ------------ | ----------- | ----------------------------- |
|
| ------------------ | -------------------- | ------------ | ----------- | ----------------------------- |
|
||||||
| `none` or `1` | 1 | 1 | 1 | yes |
|
| `consistent` | 1 | 1 | 1 | yes |
|
||||||
| `2` | 2 | 2 | 1 | yes |
|
| `consistent` | 2 | 2 | 1 | yes |
|
||||||
| `2-dangerous` | 2 | 1 | 1 | NO |
|
| `dangerous` | 2 | 1 | 1 | NO |
|
||||||
| `3` | 3 | 2 | 2 | yes |
|
| `consistent` | 3 | 2 | 2 | yes |
|
||||||
| `3-degraded` | 3 | 2 | 1 | NO |
|
| `degraded` | 3 | 2 | 1 | NO |
|
||||||
| `3-dangerous` | 3 | 1 | 1 | NO |
|
| `dangerous` | 3 | 1 | 1 | NO |
|
||||||
|
|
||||||
Changing the `replication_mode` between modes with the same number of replicas
|
|
||||||
(e.g. from `3` to `3-degraded`, or from `2-dangerous` to `2`), can be done easily by
|
|
||||||
just changing the `replication_mode` parameter in your config files and restarting all your
|
|
||||||
Garage nodes.
|
|
||||||
|
|
||||||
It is also technically possible to change the replication mode to a mode with a
|
|
||||||
different numbers of replicas, although it's a dangerous operation that is not
|
|
||||||
officially supported. This requires you to delete the existing cluster layout
|
|
||||||
and create a new layout from scratch, meaning that a full rebalancing of your
|
|
||||||
cluster's data will be needed. To do it, shut down your cluster entirely,
|
|
||||||
delete the `custer_layout` files in the meta directories of all your nodes,
|
|
||||||
update all your configuration files with the new `replication_mode` parameter,
|
|
||||||
restart your cluster, and then create a new layout with all the nodes you want
|
|
||||||
to keep. Rebalancing data will take some time, and data might temporarily
|
|
||||||
appear unavailable to your users. It is recommended to shut down public access
|
|
||||||
to the cluster while rebalancing is in progress. In theory, no data should be
|
|
||||||
lost as rebalancing is a routine operation for Garage, although we cannot
|
|
||||||
guarantee you that everything will go right in such an extreme scenario.
|
|
||||||
|
|
||||||
#### `metadata_dir` {#metadata_dir}
|
#### `metadata_dir` {#metadata_dir}
|
||||||
|
|
||||||
|
|
|
@ -39,10 +39,10 @@ Read about cluster layout management [here](@/documentation/operations/layout.md
|
||||||
|
|
||||||
### Several replication modes
|
### Several replication modes
|
||||||
|
|
||||||
Garage supports a variety of replication modes, with 1 copy, 2 copies or 3 copies of your data,
|
Garage supports a variety of replication modes, with configurable replica count,
|
||||||
and with various levels of consistency, in order to adapt to a variety of usage scenarios.
|
and with various levels of consistency, in order to adapt to a variety of usage scenarios.
|
||||||
Read our reference page on [supported replication modes](@/documentation/reference-manual/configuration.md#replication_mode)
|
Read our reference page on [supported replication modes](@/documentation/reference-manual/configuration.md#replication_factor)
|
||||||
to select the replication mode best suited to your use case (hint: in most cases, `replication_mode = "3"` is what you want).
|
to select the replication mode best suited to your use case (hint: in most cases, `replication_factor = 3` is what you want).
|
||||||
|
|
||||||
### Compression and deduplication
|
### Compression and deduplication
|
||||||
|
|
||||||
|
|
|
@ -163,7 +163,7 @@ mod tests {
|
||||||
r#"
|
r#"
|
||||||
metadata_dir = "/tmp/garage/meta"
|
metadata_dir = "/tmp/garage/meta"
|
||||||
data_dir = "/tmp/garage/data"
|
data_dir = "/tmp/garage/data"
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
rpc_bind_addr = "[::]:3901"
|
rpc_bind_addr = "[::]:3901"
|
||||||
rpc_secret_file = "{}"
|
rpc_secret_file = "{}"
|
||||||
|
|
||||||
|
@ -185,7 +185,7 @@ mod tests {
|
||||||
r#"
|
r#"
|
||||||
metadata_dir = "/tmp/garage/meta"
|
metadata_dir = "/tmp/garage/meta"
|
||||||
data_dir = "/tmp/garage/data"
|
data_dir = "/tmp/garage/data"
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
rpc_bind_addr = "[::]:3901"
|
rpc_bind_addr = "[::]:3901"
|
||||||
rpc_secret_file = "{}"
|
rpc_secret_file = "{}"
|
||||||
allow_world_readable_secrets = true
|
allow_world_readable_secrets = true
|
||||||
|
@ -296,7 +296,7 @@ mod tests {
|
||||||
r#"
|
r#"
|
||||||
metadata_dir = "/tmp/garage/meta"
|
metadata_dir = "/tmp/garage/meta"
|
||||||
data_dir = "/tmp/garage/data"
|
data_dir = "/tmp/garage/data"
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
rpc_bind_addr = "[::]:3901"
|
rpc_bind_addr = "[::]:3901"
|
||||||
rpc_secret= "dummy"
|
rpc_secret= "dummy"
|
||||||
rpc_secret_file = "dummy"
|
rpc_secret_file = "dummy"
|
||||||
|
|
|
@ -54,7 +54,7 @@ metadata_dir = "{path}/meta"
|
||||||
data_dir = "{path}/data"
|
data_dir = "{path}/data"
|
||||||
db_engine = "lmdb"
|
db_engine = "lmdb"
|
||||||
|
|
||||||
replication_mode = "1"
|
replication_factor = 1
|
||||||
|
|
||||||
rpc_bind_addr = "127.0.0.1:{rpc_port}"
|
rpc_bind_addr = "127.0.0.1:{rpc_port}"
|
||||||
rpc_public_addr = "127.0.0.1:{rpc_port}"
|
rpc_public_addr = "127.0.0.1:{rpc_port}"
|
||||||
|
|
|
@ -9,7 +9,7 @@ use garage_util::config::*;
|
||||||
use garage_util::error::*;
|
use garage_util::error::*;
|
||||||
use garage_util::persister::PersisterShared;
|
use garage_util::persister::PersisterShared;
|
||||||
|
|
||||||
use garage_rpc::replication_mode::ReplicationMode;
|
use garage_rpc::replication_mode::*;
|
||||||
use garage_rpc::system::System;
|
use garage_rpc::system::System;
|
||||||
|
|
||||||
use garage_block::manager::*;
|
use garage_block::manager::*;
|
||||||
|
@ -39,8 +39,8 @@ pub struct Garage {
|
||||||
/// The set of background variables that can be viewed/modified at runtime
|
/// The set of background variables that can be viewed/modified at runtime
|
||||||
pub bg_vars: vars::BgVars,
|
pub bg_vars: vars::BgVars,
|
||||||
|
|
||||||
/// The replication mode of this cluster
|
/// The replication factor of this cluster
|
||||||
pub replication_mode: ReplicationMode,
|
pub replication_factor: ReplicationFactor,
|
||||||
|
|
||||||
/// The local database
|
/// The local database
|
||||||
pub db: db::Db,
|
pub db: db::Db,
|
||||||
|
@ -222,27 +222,26 @@ impl Garage {
|
||||||
.and_then(|x| NetworkKey::from_slice(&x))
|
.and_then(|x| NetworkKey::from_slice(&x))
|
||||||
.ok_or_message("Invalid RPC secret key")?;
|
.ok_or_message("Invalid RPC secret key")?;
|
||||||
|
|
||||||
let replication_mode = ReplicationMode::parse(&config.replication_mode)
|
let (replication_factor, consistency_mode) = parse_replication_mode(&config)?;
|
||||||
.ok_or_message("Invalid replication_mode in config file.")?;
|
|
||||||
|
|
||||||
info!("Initialize background variable system...");
|
info!("Initialize background variable system...");
|
||||||
let mut bg_vars = vars::BgVars::new();
|
let mut bg_vars = vars::BgVars::new();
|
||||||
|
|
||||||
info!("Initialize membership management system...");
|
info!("Initialize membership management system...");
|
||||||
let system = System::new(network_key, replication_mode, &config)?;
|
let system = System::new(network_key, replication_factor, consistency_mode, &config)?;
|
||||||
|
|
||||||
let data_rep_param = TableShardedReplication {
|
let data_rep_param = TableShardedReplication {
|
||||||
system: system.clone(),
|
system: system.clone(),
|
||||||
replication_factor: replication_mode.replication_factor(),
|
replication_factor: replication_factor.into(),
|
||||||
write_quorum: replication_mode.write_quorum(),
|
write_quorum: replication_factor.write_quorum(consistency_mode),
|
||||||
read_quorum: 1,
|
read_quorum: 1,
|
||||||
};
|
};
|
||||||
|
|
||||||
let meta_rep_param = TableShardedReplication {
|
let meta_rep_param = TableShardedReplication {
|
||||||
system: system.clone(),
|
system: system.clone(),
|
||||||
replication_factor: replication_mode.replication_factor(),
|
replication_factor: replication_factor.into(),
|
||||||
write_quorum: replication_mode.write_quorum(),
|
write_quorum: replication_factor.write_quorum(consistency_mode),
|
||||||
read_quorum: replication_mode.read_quorum(),
|
read_quorum: replication_factor.read_quorum(consistency_mode),
|
||||||
};
|
};
|
||||||
|
|
||||||
let control_rep_param = TableFullReplication {
|
let control_rep_param = TableFullReplication {
|
||||||
|
@ -338,7 +337,7 @@ impl Garage {
|
||||||
Ok(Arc::new(Self {
|
Ok(Arc::new(Self {
|
||||||
config,
|
config,
|
||||||
bg_vars,
|
bg_vars,
|
||||||
replication_mode,
|
replication_factor,
|
||||||
db,
|
db,
|
||||||
system,
|
system,
|
||||||
block_manager,
|
block_manager,
|
||||||
|
|
|
@ -7,7 +7,7 @@ use serde::{Deserialize, Serialize};
|
||||||
use garage_util::data::*;
|
use garage_util::data::*;
|
||||||
|
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::replication_mode::ReplicationMode;
|
use crate::replication_mode::*;
|
||||||
|
|
||||||
#[derive(Debug, Clone, Serialize, Deserialize, Default, PartialEq, Eq)]
|
#[derive(Debug, Clone, Serialize, Deserialize, Default, PartialEq, Eq)]
|
||||||
pub struct RpcLayoutDigest {
|
pub struct RpcLayoutDigest {
|
||||||
|
@ -29,7 +29,8 @@ pub struct SyncLayoutDigest {
|
||||||
}
|
}
|
||||||
|
|
||||||
pub struct LayoutHelper {
|
pub struct LayoutHelper {
|
||||||
replication_mode: ReplicationMode,
|
replication_factor: ReplicationFactor,
|
||||||
|
consistency_mode: ConsistencyMode,
|
||||||
layout: Option<LayoutHistory>,
|
layout: Option<LayoutHistory>,
|
||||||
|
|
||||||
// cached values
|
// cached values
|
||||||
|
@ -57,7 +58,8 @@ impl Deref for LayoutHelper {
|
||||||
|
|
||||||
impl LayoutHelper {
|
impl LayoutHelper {
|
||||||
pub fn new(
|
pub fn new(
|
||||||
replication_mode: ReplicationMode,
|
replication_factor: ReplicationFactor,
|
||||||
|
consistency_mode: ConsistencyMode,
|
||||||
mut layout: LayoutHistory,
|
mut layout: LayoutHistory,
|
||||||
mut ack_lock: HashMap<u64, AtomicUsize>,
|
mut ack_lock: HashMap<u64, AtomicUsize>,
|
||||||
) -> Self {
|
) -> Self {
|
||||||
|
@ -66,7 +68,7 @@ impl LayoutHelper {
|
||||||
// correct and we have rapid access to important values such as
|
// correct and we have rapid access to important values such as
|
||||||
// the layout versions to use when reading to ensure consistency.
|
// the layout versions to use when reading to ensure consistency.
|
||||||
|
|
||||||
if !replication_mode.is_read_after_write_consistent() {
|
if consistency_mode != ConsistencyMode::Consistent {
|
||||||
// Fast path for when no consistency is required.
|
// Fast path for when no consistency is required.
|
||||||
// In this case we only need to keep the last version of the layout,
|
// In this case we only need to keep the last version of the layout,
|
||||||
// we don't care about coordinating stuff in the cluster.
|
// we don't care about coordinating stuff in the cluster.
|
||||||
|
@ -103,7 +105,7 @@ impl LayoutHelper {
|
||||||
// This value is calculated using quorums to allow progress even
|
// This value is calculated using quorums to allow progress even
|
||||||
// if not all nodes have successfully completed a sync.
|
// if not all nodes have successfully completed a sync.
|
||||||
let sync_map_min =
|
let sync_map_min =
|
||||||
layout.calculate_sync_map_min_with_quorum(replication_mode, &all_nongateway_nodes);
|
layout.calculate_sync_map_min_with_quorum(replication_factor, &all_nongateway_nodes);
|
||||||
|
|
||||||
let trackers_hash = layout.calculate_trackers_hash();
|
let trackers_hash = layout.calculate_trackers_hash();
|
||||||
let staging_hash = layout.calculate_staging_hash();
|
let staging_hash = layout.calculate_staging_hash();
|
||||||
|
@ -114,7 +116,8 @@ impl LayoutHelper {
|
||||||
.or_insert(AtomicUsize::new(0));
|
.or_insert(AtomicUsize::new(0));
|
||||||
|
|
||||||
LayoutHelper {
|
LayoutHelper {
|
||||||
replication_mode,
|
replication_factor,
|
||||||
|
consistency_mode,
|
||||||
layout: Some(layout),
|
layout: Some(layout),
|
||||||
ack_map_min,
|
ack_map_min,
|
||||||
sync_map_min,
|
sync_map_min,
|
||||||
|
@ -139,7 +142,8 @@ impl LayoutHelper {
|
||||||
let changed = f(self.layout.as_mut().unwrap());
|
let changed = f(self.layout.as_mut().unwrap());
|
||||||
if changed {
|
if changed {
|
||||||
*self = Self::new(
|
*self = Self::new(
|
||||||
self.replication_mode,
|
self.replication_factor,
|
||||||
|
self.consistency_mode,
|
||||||
self.layout.take().unwrap(),
|
self.layout.take().unwrap(),
|
||||||
std::mem::take(&mut self.ack_lock),
|
std::mem::take(&mut self.ack_lock),
|
||||||
);
|
);
|
||||||
|
|
|
@ -6,11 +6,11 @@ use garage_util::encode::nonversioned_encode;
|
||||||
use garage_util::error::*;
|
use garage_util::error::*;
|
||||||
|
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::replication_mode::ReplicationMode;
|
use crate::replication_mode::*;
|
||||||
|
|
||||||
impl LayoutHistory {
|
impl LayoutHistory {
|
||||||
pub fn new(replication_factor: usize) -> Self {
|
pub fn new(replication_factor: ReplicationFactor) -> Self {
|
||||||
let version = LayoutVersion::new(replication_factor);
|
let version = LayoutVersion::new(replication_factor.into());
|
||||||
|
|
||||||
let staging = LayoutStaging {
|
let staging = LayoutStaging {
|
||||||
parameters: Lww::<LayoutParameters>::new(version.parameters),
|
parameters: Lww::<LayoutParameters>::new(version.parameters),
|
||||||
|
@ -119,7 +119,7 @@ impl LayoutHistory {
|
||||||
|
|
||||||
pub(crate) fn calculate_sync_map_min_with_quorum(
|
pub(crate) fn calculate_sync_map_min_with_quorum(
|
||||||
&self,
|
&self,
|
||||||
replication_mode: ReplicationMode,
|
replication_factor: ReplicationFactor,
|
||||||
all_nongateway_nodes: &[Uuid],
|
all_nongateway_nodes: &[Uuid],
|
||||||
) -> u64 {
|
) -> u64 {
|
||||||
// This function calculates the minimum layout version from which
|
// This function calculates the minimum layout version from which
|
||||||
|
@ -133,7 +133,7 @@ impl LayoutHistory {
|
||||||
return self.current().version;
|
return self.current().version;
|
||||||
}
|
}
|
||||||
|
|
||||||
let quorum = replication_mode.write_quorum();
|
let quorum = replication_factor.write_quorum(ConsistencyMode::Consistent);
|
||||||
|
|
||||||
let min_version = self.min_stored();
|
let min_version = self.min_stored();
|
||||||
let global_min = self
|
let global_min = self
|
||||||
|
|
|
@ -14,13 +14,13 @@ use garage_util::error::*;
|
||||||
use garage_util::persister::Persister;
|
use garage_util::persister::Persister;
|
||||||
|
|
||||||
use super::*;
|
use super::*;
|
||||||
use crate::replication_mode::ReplicationMode;
|
use crate::replication_mode::*;
|
||||||
use crate::rpc_helper::*;
|
use crate::rpc_helper::*;
|
||||||
use crate::system::*;
|
use crate::system::*;
|
||||||
|
|
||||||
pub struct LayoutManager {
|
pub struct LayoutManager {
|
||||||
node_id: Uuid,
|
node_id: Uuid,
|
||||||
replication_mode: ReplicationMode,
|
replication_factor: ReplicationFactor,
|
||||||
persist_cluster_layout: Persister<LayoutHistory>,
|
persist_cluster_layout: Persister<LayoutHistory>,
|
||||||
|
|
||||||
layout: Arc<RwLock<LayoutHelper>>,
|
layout: Arc<RwLock<LayoutHelper>>,
|
||||||
|
@ -38,20 +38,19 @@ impl LayoutManager {
|
||||||
node_id: NodeID,
|
node_id: NodeID,
|
||||||
system_endpoint: Arc<Endpoint<SystemRpc, System>>,
|
system_endpoint: Arc<Endpoint<SystemRpc, System>>,
|
||||||
peering: Arc<PeeringManager>,
|
peering: Arc<PeeringManager>,
|
||||||
replication_mode: ReplicationMode,
|
replication_factor: ReplicationFactor,
|
||||||
|
consistency_mode: ConsistencyMode,
|
||||||
) -> Result<Arc<Self>, Error> {
|
) -> Result<Arc<Self>, Error> {
|
||||||
let replication_factor = replication_mode.replication_factor();
|
|
||||||
|
|
||||||
let persist_cluster_layout: Persister<LayoutHistory> =
|
let persist_cluster_layout: Persister<LayoutHistory> =
|
||||||
Persister::new(&config.metadata_dir, "cluster_layout");
|
Persister::new(&config.metadata_dir, "cluster_layout");
|
||||||
|
|
||||||
let cluster_layout = match persist_cluster_layout.load() {
|
let cluster_layout = match persist_cluster_layout.load() {
|
||||||
Ok(x) => {
|
Ok(x) => {
|
||||||
if x.current().replication_factor != replication_mode.replication_factor() {
|
if x.current().replication_factor != replication_factor.replication_factor() {
|
||||||
return Err(Error::Message(format!(
|
return Err(Error::Message(format!(
|
||||||
"Prevous cluster layout has replication factor {}, which is different than the one specified in the config file ({}). The previous cluster layout can be purged, if you know what you are doing, simply by deleting the `cluster_layout` file in your metadata directory.",
|
"Prevous cluster layout has replication factor {}, which is different than the one specified in the config file ({}). The previous cluster layout can be purged, if you know what you are doing, simply by deleting the `cluster_layout` file in your metadata directory.",
|
||||||
x.current().replication_factor,
|
x.current().replication_factor,
|
||||||
replication_factor
|
replication_factor.replication_factor()
|
||||||
)));
|
)));
|
||||||
}
|
}
|
||||||
x
|
x
|
||||||
|
@ -65,8 +64,12 @@ impl LayoutManager {
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
let mut cluster_layout =
|
let mut cluster_layout = LayoutHelper::new(
|
||||||
LayoutHelper::new(replication_mode, cluster_layout, Default::default());
|
replication_factor,
|
||||||
|
consistency_mode,
|
||||||
|
cluster_layout,
|
||||||
|
Default::default(),
|
||||||
|
);
|
||||||
cluster_layout.update_trackers(node_id.into());
|
cluster_layout.update_trackers(node_id.into());
|
||||||
|
|
||||||
let layout = Arc::new(RwLock::new(cluster_layout));
|
let layout = Arc::new(RwLock::new(cluster_layout));
|
||||||
|
@ -81,7 +84,7 @@ impl LayoutManager {
|
||||||
|
|
||||||
Ok(Arc::new(Self {
|
Ok(Arc::new(Self {
|
||||||
node_id: node_id.into(),
|
node_id: node_id.into(),
|
||||||
replication_mode,
|
replication_factor,
|
||||||
persist_cluster_layout,
|
persist_cluster_layout,
|
||||||
layout,
|
layout,
|
||||||
change_notify,
|
change_notify,
|
||||||
|
@ -295,11 +298,11 @@ impl LayoutManager {
|
||||||
adv.update_trackers
|
adv.update_trackers
|
||||||
);
|
);
|
||||||
|
|
||||||
if adv.current().replication_factor != self.replication_mode.replication_factor() {
|
if adv.current().replication_factor != self.replication_factor.replication_factor() {
|
||||||
let msg = format!(
|
let msg = format!(
|
||||||
"Received a cluster layout from another node with replication factor {}, which is different from what we have in our configuration ({}). Discarding the cluster layout we received.",
|
"Received a cluster layout from another node with replication factor {}, which is different from what we have in our configuration ({}). Discarding the cluster layout we received.",
|
||||||
adv.current().replication_factor,
|
adv.current().replication_factor,
|
||||||
self.replication_mode.replication_factor()
|
self.replication_factor.replication_factor()
|
||||||
);
|
);
|
||||||
error!("{}", msg);
|
error!("{}", msg);
|
||||||
return Err(Error::Message(msg));
|
return Err(Error::Message(msg));
|
||||||
|
|
|
@ -5,6 +5,7 @@ use garage_util::crdt::Crdt;
|
||||||
use garage_util::error::*;
|
use garage_util::error::*;
|
||||||
|
|
||||||
use crate::layout::*;
|
use crate::layout::*;
|
||||||
|
use crate::replication_mode::ReplicationFactor;
|
||||||
|
|
||||||
// This function checks that the partition size S computed is at least better than the
|
// This function checks that the partition size S computed is at least better than the
|
||||||
// one given by a very naive algorithm. To do so, we try to run the naive algorithm
|
// one given by a very naive algorithm. To do so, we try to run the naive algorithm
|
||||||
|
@ -120,7 +121,7 @@ fn test_assignment() {
|
||||||
let mut node_capacity_vec = vec![4000, 1000, 2000];
|
let mut node_capacity_vec = vec![4000, 1000, 2000];
|
||||||
let mut node_zone_vec = vec!["A", "B", "C"];
|
let mut node_zone_vec = vec!["A", "B", "C"];
|
||||||
|
|
||||||
let mut cl = LayoutHistory::new(3);
|
let mut cl = LayoutHistory::new(ReplicationFactor::new(3).unwrap());
|
||||||
update_layout(&mut cl, &node_capacity_vec, &node_zone_vec, 3);
|
update_layout(&mut cl, &node_capacity_vec, &node_zone_vec, 3);
|
||||||
let v = cl.current().version;
|
let v = cl.current().version;
|
||||||
let (mut cl, msg) = cl.apply_staged_changes(Some(v + 1)).unwrap();
|
let (mut cl, msg) = cl.apply_staged_changes(Some(v + 1)).unwrap();
|
||||||
|
|
|
@ -1,57 +1,94 @@
|
||||||
#[derive(Clone, Copy)]
|
use garage_util::config::Config;
|
||||||
pub enum ReplicationMode {
|
use garage_util::crdt::AutoCrdt;
|
||||||
None,
|
use garage_util::error::*;
|
||||||
TwoWay,
|
use serde::{Deserialize, Serialize};
|
||||||
TwoWayDangerous,
|
|
||||||
ThreeWay,
|
#[derive(Debug, Clone, Copy, PartialEq, Serialize, Deserialize)]
|
||||||
ThreeWayDegraded,
|
#[serde(transparent)]
|
||||||
ThreeWayDangerous,
|
pub struct ReplicationFactor(usize);
|
||||||
|
|
||||||
|
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Default, Serialize, Deserialize)]
|
||||||
|
#[serde(rename_all = "lowercase")]
|
||||||
|
pub enum ConsistencyMode {
|
||||||
|
/// Read- and Write-quorum are 1
|
||||||
|
Dangerous,
|
||||||
|
/// Read-quorum is 1
|
||||||
|
Degraded,
|
||||||
|
/// Read- and Write-quorum are determined for read-after-write-consistency
|
||||||
|
#[default]
|
||||||
|
Consistent,
|
||||||
}
|
}
|
||||||
lx
commented
Does this work for values written in lowercase? Because the enum variants start with a capital letter. Does this work for values written in lowercase? Because the enum variants start with a capital letter.
|
|||||||
|
|
||||||
impl ReplicationMode {
|
impl ConsistencyMode {
|
||||||
pub fn parse(v: &str) -> Option<Self> {
|
pub fn parse(s: &str) -> Option<Self> {
|
||||||
match v {
|
serde_json::from_value(serde_json::Value::String(s.to_string())).ok()
|
||||||
"none" | "1" => Some(Self::None),
|
}
|
||||||
"2" => Some(Self::TwoWay),
|
}
|
||||||
"2-dangerous" => Some(Self::TwoWayDangerous),
|
|
||||||
"3" => Some(Self::ThreeWay),
|
impl AutoCrdt for ConsistencyMode {
|
||||||
"3-degraded" => Some(Self::ThreeWayDegraded),
|
const WARN_IF_DIFFERENT: bool = true;
|
||||||
"3-dangerous" => Some(Self::ThreeWayDangerous),
|
}
|
||||||
_ => None,
|
|
||||||
|
impl ReplicationFactor {
|
||||||
|
pub fn new(replication_factor: usize) -> Option<Self> {
|
||||||
|
if replication_factor < 1 {
|
||||||
|
None
|
||||||
|
} else {
|
||||||
|
Some(Self(replication_factor))
|
||||||
yuka marked this conversation as resolved
Outdated
lx
commented
We don't need this function anymore, right? We don't need this function anymore, right?
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn replication_factor(&self) -> usize {
|
pub fn replication_factor(&self) -> usize {
|
||||||
match self {
|
self.0
|
||||||
Self::None => 1,
|
}
|
||||||
Self::TwoWay | Self::TwoWayDangerous => 2,
|
|
||||||
Self::ThreeWay | Self::ThreeWayDegraded | Self::ThreeWayDangerous => 3,
|
pub fn read_quorum(&self, consistency_mode: ConsistencyMode) -> usize {
|
||||||
|
match consistency_mode {
|
||||||
|
ConsistencyMode::Dangerous | ConsistencyMode::Degraded => 1,
|
||||||
|
ConsistencyMode::Consistent => self.replication_factor().div_ceil(2),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn read_quorum(&self) -> usize {
|
pub fn write_quorum(&self, consistency_mode: ConsistencyMode) -> usize {
|
||||||
match self {
|
match consistency_mode {
|
||||||
Self::None => 1,
|
ConsistencyMode::Dangerous => 1,
|
||||||
Self::TwoWay | Self::TwoWayDangerous => 1,
|
ConsistencyMode::Degraded | ConsistencyMode::Consistent => {
|
||||||
Self::ThreeWay => 2,
|
(self.replication_factor() + 1) - self.read_quorum(ConsistencyMode::Consistent)
|
||||||
Self::ThreeWayDegraded | Self::ThreeWayDangerous => 1,
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
yuka marked this conversation as resolved
Outdated
lx
commented
In degraded mode, we will have write_quorum = replication_mode, but that doesn't seem right? Can be fixed by:
In degraded mode, we will have write_quorum = replication_mode, but that doesn't seem right?
Can be fixed by:
```rust
ConsistencyMode::Degraded | ConsistencyMode::Consistent => {
(self.replication_factor() + 1) - self.read_quorum(ConsistencyMode::Consistent)
}
```
|
|||||||
}
|
}
|
||||||
|
|
||||||
pub fn write_quorum(&self) -> usize {
|
impl std::convert::From<ReplicationFactor> for usize {
|
||||||
match self {
|
fn from(replication_factor: ReplicationFactor) -> usize {
|
||||||
Self::None => 1,
|
replication_factor.0
|
||||||
Self::TwoWay => 2,
|
|
||||||
Self::TwoWayDangerous => 1,
|
|
||||||
Self::ThreeWay | Self::ThreeWayDegraded => 2,
|
|
||||||
Self::ThreeWayDangerous => 1,
|
|
||||||
}
|
}
|
||||||
yuka marked this conversation as resolved
Outdated
lx
commented
The same can be achieved with The same can be achieved with `#[derive(Serialize, Deserialize)]` + `#[serde(transparent)]`
|
|||||||
}
|
}
|
||||||
|
|
||||||
pub fn is_read_after_write_consistent(&self) -> bool {
|
pub fn parse_replication_mode(
|
||||||
match self {
|
config: &Config,
|
||||||
Self::None | Self::TwoWay | Self::ThreeWay => true,
|
) -> Result<(ReplicationFactor, ConsistencyMode), Error> {
|
||||||
_ => false,
|
match (&config.replication_mode, config.replication_factor, config.consistency_mode.as_str()) {
|
||||||
}
|
(Some(replication_mode), None, "consistent") => {
|
||||||
|
tracing::warn!("Legacy config option replication_mode in use. Please migrate to replication_factor and consistency_mode");
|
||||||
|
let parsed_replication_mode = match replication_mode.as_str() {
|
||||||
|
"1" | "none" => Some((ReplicationFactor(1), ConsistencyMode::Consistent)),
|
||||||
|
"2" => Some((ReplicationFactor(2), ConsistencyMode::Consistent)),
|
||||||
|
"2-dangerous" => Some((ReplicationFactor(2), ConsistencyMode::Dangerous)),
|
||||||
|
"3" => Some((ReplicationFactor(3), ConsistencyMode::Consistent)),
|
||||||
|
"3-degraded" => Some((ReplicationFactor(3), ConsistencyMode::Degraded)),
|
||||||
|
"3-dangerous" => Some((ReplicationFactor(3), ConsistencyMode::Dangerous)),
|
||||||
|
_ => None,
|
||||||
|
};
|
||||||
|
Some(parsed_replication_mode.ok_or_message("Invalid replication_mode in config file.")?)
|
||||||
|
},
|
||||||
|
(None, Some(replication_factor), consistency_mode) => {
|
||||||
|
let replication_factor = ReplicationFactor::new(replication_factor)
|
||||||
|
.ok_or_message("Invalid replication_factor in config file.")?;
|
||||||
|
let consistency_mode = ConsistencyMode::parse(consistency_mode)
|
||||||
|
.ok_or_message("Invalid consistency_mode in config file.")?;
|
||||||
|
Some((replication_factor, consistency_mode))
|
||||||
}
|
}
|
||||||
|
_ => None,
|
||||||
|
}.ok_or_message("Either the legacy replication_mode or replication_level and consistency_mode can be set, not both.")
|
||||||
}
|
}
|
||||||
|
|
|
@ -112,8 +112,7 @@ pub struct System {
|
||||||
|
|
||||||
metrics: ArcSwapOption<SystemMetrics>,
|
metrics: ArcSwapOption<SystemMetrics>,
|
||||||
|
|
||||||
replication_mode: ReplicationMode,
|
pub(crate) replication_factor: ReplicationFactor,
|
||||||
pub(crate) replication_factor: usize,
|
|
||||||
|
|
||||||
/// Path to metadata directory
|
/// Path to metadata directory
|
||||||
pub metadata_dir: PathBuf,
|
pub metadata_dir: PathBuf,
|
||||||
|
@ -243,7 +242,8 @@ impl System {
|
||||||
/// Create this node's membership manager
|
/// Create this node's membership manager
|
||||||
pub fn new(
|
pub fn new(
|
||||||
network_key: NetworkKey,
|
network_key: NetworkKey,
|
||||||
replication_mode: ReplicationMode,
|
replication_factor: ReplicationFactor,
|
||||||
|
consistency_mode: ConsistencyMode,
|
||||||
config: &Config,
|
config: &Config,
|
||||||
) -> Result<Arc<Self>, Error> {
|
) -> Result<Arc<Self>, Error> {
|
||||||
// ---- setup netapp RPC protocol ----
|
// ---- setup netapp RPC protocol ----
|
||||||
|
@ -274,14 +274,13 @@ impl System {
|
||||||
let persist_peer_list = Persister::new(&config.metadata_dir, "peer_list");
|
let persist_peer_list = Persister::new(&config.metadata_dir, "peer_list");
|
||||||
|
|
||||||
// ---- setup cluster layout and layout manager ----
|
// ---- setup cluster layout and layout manager ----
|
||||||
let replication_factor = replication_mode.replication_factor();
|
|
||||||
|
|
||||||
let layout_manager = LayoutManager::new(
|
let layout_manager = LayoutManager::new(
|
||||||
config,
|
config,
|
||||||
netapp.id,
|
netapp.id,
|
||||||
system_endpoint.clone(),
|
system_endpoint.clone(),
|
||||||
peering.clone(),
|
peering.clone(),
|
||||||
replication_mode,
|
replication_factor,
|
||||||
|
consistency_mode,
|
||||||
)?;
|
)?;
|
||||||
|
|
||||||
let mut local_status = NodeStatus::initial(replication_factor, &layout_manager);
|
let mut local_status = NodeStatus::initial(replication_factor, &layout_manager);
|
||||||
|
@ -315,7 +314,6 @@ impl System {
|
||||||
netapp: netapp.clone(),
|
netapp: netapp.clone(),
|
||||||
peering: peering.clone(),
|
peering: peering.clone(),
|
||||||
system_endpoint,
|
system_endpoint,
|
||||||
replication_mode,
|
|
||||||
replication_factor,
|
replication_factor,
|
||||||
rpc_listen_addr: config.rpc_bind_addr,
|
rpc_listen_addr: config.rpc_bind_addr,
|
||||||
rpc_public_addr,
|
rpc_public_addr,
|
||||||
|
@ -427,7 +425,9 @@ impl System {
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn health(&self) -> ClusterHealth {
|
pub fn health(&self) -> ClusterHealth {
|
||||||
let quorum = self.replication_mode.write_quorum();
|
let quorum = self
|
||||||
|
.replication_factor
|
||||||
|
.write_quorum(ConsistencyMode::Consistent);
|
||||||
|
|
||||||
// Gather information about running nodes.
|
// Gather information about running nodes.
|
||||||
// Technically, `nodes` contains currently running nodes, as well
|
// Technically, `nodes` contains currently running nodes, as well
|
||||||
|
@ -631,7 +631,7 @@ impl System {
|
||||||
.count();
|
.count();
|
||||||
|
|
||||||
let not_configured = self.cluster_layout().check().is_err();
|
let not_configured = self.cluster_layout().check().is_err();
|
||||||
let no_peers = n_connected < self.replication_factor;
|
let no_peers = n_connected < self.replication_factor.into();
|
||||||
let expected_n_nodes = self.cluster_layout().all_nodes().len();
|
let expected_n_nodes = self.cluster_layout().all_nodes().len();
|
||||||
let bad_peers = n_connected != expected_n_nodes;
|
let bad_peers = n_connected != expected_n_nodes;
|
||||||
|
|
||||||
|
@ -774,14 +774,14 @@ impl EndpointHandler<SystemRpc> for System {
|
||||||
}
|
}
|
||||||
|
|
||||||
impl NodeStatus {
|
impl NodeStatus {
|
||||||
fn initial(replication_factor: usize, layout_manager: &LayoutManager) -> Self {
|
fn initial(replication_factor: ReplicationFactor, layout_manager: &LayoutManager) -> Self {
|
||||||
NodeStatus {
|
NodeStatus {
|
||||||
hostname: Some(
|
hostname: Some(
|
||||||
gethostname::gethostname()
|
gethostname::gethostname()
|
||||||
.into_string()
|
.into_string()
|
||||||
.unwrap_or_else(|_| "<invalid utf-8>".to_string()),
|
.unwrap_or_else(|_| "<invalid utf-8>".to_string()),
|
||||||
),
|
),
|
||||||
replication_factor,
|
replication_factor: replication_factor.into(),
|
||||||
layout_digest: layout_manager.layout().digest(),
|
layout_digest: layout_manager.layout().digest(),
|
||||||
meta_disk_avail: None,
|
meta_disk_avail: None,
|
||||||
data_disk_avail: None,
|
data_disk_avail: None,
|
||||||
|
|
|
@ -68,7 +68,7 @@ impl SystemMetrics {
|
||||||
let replication_factor = system.replication_factor;
|
let replication_factor = system.replication_factor;
|
||||||
meter
|
meter
|
||||||
.u64_value_observer("garage_replication_factor", move |observer| {
|
.u64_value_observer("garage_replication_factor", move |observer| {
|
||||||
observer.observe(replication_factor as u64, &[])
|
observer.observe(replication_factor.replication_factor() as u64, &[])
|
||||||
})
|
})
|
||||||
.with_description("Garage replication factor setting")
|
.with_description("Garage replication factor setting")
|
||||||
.init()
|
.init()
|
||||||
|
|
|
@ -30,12 +30,20 @@ pub struct Config {
|
||||||
)]
|
)]
|
||||||
pub block_size: usize,
|
pub block_size: usize,
|
||||||
|
|
||||||
/// Replication mode. Supported values:
|
/// Number of replicas. Can be any positive integer, but uneven numbers are more favorable.
|
||||||
/// - none, 1 -> no replication
|
/// - 1 for single-node clusters, or to disable replication
|
||||||
/// - 2 -> 2-way replication
|
/// - 3 is the recommended and supported setting.
|
||||||
/// - 3 -> 3-way replication
|
#[serde(default)]
|
||||||
// (we can add more aliases for this later)
|
pub replication_factor: Option<usize>,
|
||||||
lx
commented
We probably want a migration path, or at least a safeguard, for people that don't update their config files and continue using the old Option 1, migration path : if replication_mode is set, check that replication_factor and consistency_mode are not set, and populate them from the legacy replication_mode value. Option 2, safeguard : if replication_mode is set, throw an error when initializing Garage We probably want a migration path, or at least a safeguard, for people that don't update their config files and continue using the old `replication_mode` variable.
Option 1, migration path : if replication_mode is set, check that replication_factor and consistency_mode are not set, and populate them from the legacy replication_mode value.
Option 2, safeguard : if replication_mode is set, throw an error when initializing Garage
|
|||||||
pub replication_mode: String,
|
|
||||||
|
/// Consistency mode for all for requests through this node
|
||||||
|
/// - Degraded -> Disable read quorum
|
||||||
|
/// - Dangerous -> Disable read and write quorum
|
||||||
|
#[serde(default = "default_consistency_mode")]
|
||||||
|
pub consistency_mode: String,
|
||||||
lx
commented
Is the default value (empty string) parsed the same as "consistent" ? Maybe we can move the ConsistencyMode enum here and use it directly in the config struct Is the default value (empty string) parsed the same as "consistent" ?
Maybe we can move the ConsistencyMode enum here and use it directly in the config struct
|
|||||||
|
|
||||||
|
/// Legacy option
|
||||||
|
pub replication_mode: Option<String>,
|
||||||
|
|
||||||
/// Zstd compression level used on data blocks
|
/// Zstd compression level used on data blocks
|
||||||
#[serde(
|
#[serde(
|
||||||
|
@ -244,10 +252,15 @@ fn default_sled_cache_capacity() -> usize {
|
||||||
fn default_sled_flush_every_ms() -> u64 {
|
fn default_sled_flush_every_ms() -> u64 {
|
||||||
2000
|
2000
|
||||||
}
|
}
|
||||||
|
|
||||||
fn default_block_size() -> usize {
|
fn default_block_size() -> usize {
|
||||||
1048576
|
1048576
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn default_consistency_mode() -> String {
|
||||||
|
"consistent".into()
|
||||||
|
}
|
||||||
|
|
||||||
fn default_compression() -> Option<i32> {
|
fn default_compression() -> Option<i32> {
|
||||||
Some(1)
|
Some(1)
|
||||||
}
|
}
|
||||||
|
@ -359,7 +372,7 @@ mod tests {
|
||||||
r#"
|
r#"
|
||||||
metadata_dir = "/tmp/garage/meta"
|
metadata_dir = "/tmp/garage/meta"
|
||||||
data_dir = "/tmp/garage/data"
|
data_dir = "/tmp/garage/data"
|
||||||
replication_mode = "3"
|
replication_factor = 3
|
||||||
rpc_bind_addr = "[::]:3901"
|
rpc_bind_addr = "[::]:3901"
|
||||||
rpc_secret = "foo"
|
rpc_secret = "foo"
|
||||||
|
|
||||||
|
|
For completeness we should include
consistency_mode = "consistent"
here (we want an example for all configuration lines)Done