infrastructure/os/config/README.md

25 lines
930 B
Markdown
Raw Normal View History

2020-07-05 17:52:31 +00:00
# ANSIBLE
2019-06-01 14:02:49 +00:00
2020-07-05 17:52:31 +00:00
## How to proceed
2019-06-01 14:02:49 +00:00
2020-07-05 17:52:31 +00:00
For each machine, **one by one** do:
- Check that cluster is healthy
2022-03-09 15:54:19 +00:00
- Check garage
- check that all nodes are online `docker exec -ti xxx /garage status`
- check that tables are in sync `docker exec -ti 63a4d7ecd795 /garage repair --yes tables`
- check garage logs
- no unknown errors or resync should be in progress
- the following line must appear `INFO garage_util::background > Worker exited: Repair worker`
2020-07-05 18:15:28 +00:00
- Check that Nomad is healthy
2020-10-22 16:29:37 +00:00
- `nomad server members`
- `nomad node status`
2020-07-05 18:15:28 +00:00
- Check that Consul is healthy
2020-10-22 16:29:37 +00:00
- `consul members`
2020-07-05 18:15:28 +00:00
- Check that Postgres is healthy
- Run `ansible-playbook -i production.yml --limit <machine> -u <username> site.yml`
2020-10-28 16:07:55 +00:00
- Run `nomad node drain -enable -force -self`
2020-07-05 17:52:31 +00:00
- Reboot
2020-10-28 16:07:55 +00:00
- Run `nomad node drain -self -disable`
2022-03-09 15:54:19 +00:00
- Check that cluster is healthy (basically the whole first point)
2019-06-01 14:02:49 +00:00