Archived

This repository has been archived on 2023-03-15. You can view files and clone it, but cannot push or open issues or pull requests.

History

Alex Auvolat ab6db28ada add Adrien@Lille to ifconfig		2023-03-15 18:19:58 +01:00
..
group_vars/all	Refactor 2	2020-09-12 20:17:07 +02:00
roles	add Adrien@Lille to ifconfig	2023-03-15 18:19:58 +01:00
cluster_nodes.yml	Add the net target to io	2022-08-17 12:26:23 +02:00
production.yml	fix io	2023-02-02 07:45:38 +01:00
README.md	Maintenance du 2022-03-09	2022-03-09 16:54:19 +01:00
README.more.md	Refactor 2	2020-09-12 20:17:07 +02:00
site.yml	Refactor 2	2020-09-12 20:17:07 +02:00
test_cluster.inventory.yml	ajout machine Spoutnik, lien vers cluster de test dans readme	2021-11-06 19:39:06 +01:00

ANSIBLE

How to proceed

For each machine, one by one do:

Check that cluster is healthy
- Check garage
  - check that all nodes are online docker exec -ti xxx /garage status
  - check that tables are in sync docker exec -ti 63a4d7ecd795 /garage repair --yes tables
  - check garage logs
    - no unknown errors or resync should be in progress
    - the following line must appear INFO garage_util::background > Worker exited: Repair worker
- Check that Nomad is healthy
  - nomad server members
  - nomad node status
- Check that Consul is healthy
  - consul members
- Check that Postgres is healthy
Run ansible-playbook -i production.yml --limit <machine> -u <username> site.yml
Run nomad node drain -enable -force -self
Reboot
Run nomad node drain -self -disable
Check that cluster is healthy (basically the whole first point)