.. | ||
group_vars/all | ||
roles | ||
cluster_nodes.yml | ||
production.yml | ||
README.md | ||
README.more.md | ||
site.yml | ||
test_cluster.inventory.yml |
ANSIBLE
How to proceed
For each machine, one by one do:
- Check that cluster is healthy
- Check garage
- check that all nodes are online
docker exec -ti xxx /garage status
- check that tables are in sync
docker exec -ti 63a4d7ecd795 /garage repair --yes tables
- check garage logs
- no unknown errors or resync should be in progress
- the following line must appear
INFO garage_util::background > Worker exited: Repair worker
- check that all nodes are online
- Check that Nomad is healthy
nomad server members
nomad node status
- Check that Consul is healthy
consul members
- Check that Postgres is healthy
- Check garage
- Run
ansible-playbook -i production.yml --limit <machine> -u <username> site.yml
- Run
nomad node drain -enable -force -self
- Reboot
- Run
nomad node drain -self -disable
- Check that cluster is healthy (basically the whole first point)