infrastructure/os/config
2023-02-02 07:45:38 +01:00
..
group_vars/all Refactor 2 2020-09-12 20:17:07 +02:00
roles Add allowed ipv6 prefix 2022-09-09 17:25:34 +02:00
cluster_nodes.yml Add the net target to io 2022-08-17 12:26:23 +02:00
production.yml fix io 2023-02-02 07:45:38 +01:00
README.md Maintenance du 2022-03-09 2022-03-09 16:54:19 +01:00
README.more.md Refactor 2 2020-09-12 20:17:07 +02:00
site.yml Refactor 2 2020-09-12 20:17:07 +02:00
test_cluster.inventory.yml ajout machine Spoutnik, lien vers cluster de test dans readme 2021-11-06 19:39:06 +01:00

ANSIBLE

How to proceed

For each machine, one by one do:

  • Check that cluster is healthy
    • Check garage
      • check that all nodes are online docker exec -ti xxx /garage status
      • check that tables are in sync docker exec -ti 63a4d7ecd795 /garage repair --yes tables
      • check garage logs
        • no unknown errors or resync should be in progress
        • the following line must appear INFO garage_util::background > Worker exited: Repair worker
    • Check that Nomad is healthy
      • nomad server members
      • nomad node status
    • Check that Consul is healthy
      • consul members
    • Check that Postgres is healthy
  • Run ansible-playbook -i production.yml --limit <machine> -u <username> site.yml
  • Run nomad node drain -enable -force -self
  • Reboot
  • Run nomad node drain -self -disable
  • Check that cluster is healthy (basically the whole first point)