infrastructure/os/config/README.md

930 B

ANSIBLE

How to proceed

For each machine, one by one do:

  • Check that cluster is healthy
    • Check garage
      • check that all nodes are online docker exec -ti xxx /garage status
      • check that tables are in sync docker exec -ti 63a4d7ecd795 /garage repair --yes tables
      • check garage logs
        • no unknown errors or resync should be in progress
        • the following line must appear INFO garage_util::background > Worker exited: Repair worker
    • Check that Nomad is healthy
      • nomad server members
      • nomad node status
    • Check that Consul is healthy
      • consul members
    • Check that Postgres is healthy
  • Run ansible-playbook -i production.yml --limit <machine> -u <username> site.yml
  • Run nomad node drain -enable -force -self
  • Reboot
  • Run nomad node drain -self -disable
  • Check that cluster is healthy (basically the whole first point)