Improve how node roles are assigned in Garage

- change the terminology: the network configuration becomes the role
  table, the configuration of a nodes becomes a node's role
- the modification of the role table takes place in two steps: first,
  changes are staged in a CRDT data structure. Then, once the user is
  happy with the changes, they can commit them all at once (or revert
  them).
- update documentation
- fix tests
- implement smarter partition assignation algorithm

This patch breaks the format of the network configuration: when
migrating, the cluster will be in a state where no roles are assigned.
All roles must be re-assigned and commited at once. This migration
should not pose an issue.

2021-11-16 16:05:53 +01:00

5 KiB

Raw Blame History

Recovering from failures

Garage is meant to work on old, second-hand hardware. In particular, this makes it likely that some of your drives will fail, and some manual intervention will be needed. Fear not! For Garage is fully equipped to handle drive failures, in most common cases.

A note on availability of Garage

With nodes dispersed in 3 zones or more, here are the guarantees Garage provides with the 3-way replication strategy (3 copies of all data, which is the recommended replication mode):

The cluster remains fully functional as long as the machines that fail are in only one zone. This includes a whole zone going down due to power/Internet outage.
No data is lost as long as the machines that fail are in at most two zones.

Of course this only works if your Garage nodes are correctly configured to be aware of the zone in which they are located. Make sure this is the case using garage status to check on the state of your cluster's configuration.

In case of temporarily disconnected nodes, Garage should automatically re-synchronize when the nodes come back up. This guide will deal with recovering from disk failures that caused the loss of the data of a node.

First option: removing a node

If you don't have spare parts (HDD, SDD) to replace the failed component, and if there are enough remaining nodes in your cluster (at least 3), you can simply remove the failed node from Garage's configuration. Note that if you do intend to replace the failed parts by new ones, using this method followed by adding back the node is not recommended (although it should work), and you should instead use one of the methods detailed in the next sections.

Removing a node is done with the following command:

garage layout remove <node_id>
garage layout show    # review the changes you are making
garage layout apply   # once satisfied, apply the changes

(you can get the node_id of the failed node by running garage status)

This will repartition the data and ensure that 3 copies of everything are present on the nodes that remain available.

Replacement scenario 1: only data is lost, metadata is fine

The recommended deployment for Garage uses an SSD to store metadata, and an HDD to store blocks of data. In the case where only a single HDD crashes, the blocks of data are lost but the metadata is still fine.

This is very easy to recover by setting up a new HDD to replace the failed one. The node does not need to be fully replaced and the configuration doesn't need to change. We just need to tell Garage to get back all the data blocks and store them on the new HDD.

First, set up a new HDD to store Garage's data directory on the failed node, and restart Garage using the existing configuration. Then, run:

garage repair -a --yes blocks

This will re-synchronize blocks of data that are missing to the new HDD, reading them from copies located on other nodes.

You can check on the advancement of this process by doing the following command:

garage stats -a

Look out for the following output:

Block manager stats:
  resync queue length: 26541

This indicates that one of the Garage node is in the process of retrieving missing data from other nodes. This number decreases to zero when the node is fully synchronized.

Replacement scenario 2: metadata (and possibly data) is lost

This scenario covers the case where a full node fails, i.e. both the metadata directory and the data directory are lost, as well as the case where only the metadata directory is lost.

To replace the lost node, we will start from an empty metadata directory, which means Garage will generate a new node ID for the replacement node. We will thus need to remove the previous node ID from Garage's configuration and replace it by the ID of the new node.

If your data directory is stored on a separate drive and is still fine, you can keep it, but it is not necessary to do so. In all cases, the data will be rebalanced and the replacement node will not store the same pieces of data as were originally stored on the one that failed. So if you keep the data files, the rebalancing might be faster but most of the pieces will be deleted anyway from the disk and replaced by other ones.

First, set up a new drive to store the metadata directory for the replacement node (a SSD is recommended), and for the data directory if necessary. You can then start Garage on the new node. The restarted node should generate a new node ID, and it should be shown as NOT CONFIGURED in garage status. The ID of the lost node should be shown in garage status in the section for disconnected/unavailable nodes.

Then, replace the broken node by the new one, using:

garage layout assign <new_node_id> --replace <old_node_id> \
		-c <capacity> -z <zone> -t <node_tag>
garage layout show    # review the changes you are making
garage layout apply   # once satisfied, apply the changes

Garage will then start synchronizing all required data on the new node. This process can be monitored using the garage stats -a command.

5 KiB Raw Blame History