infrastructure/ansible/README.md

# ANSIBLE

## How to proceed

For each machine, **one by one** do:
  - Check that cluster is healthy
    - `sudo gluster peer status`
    - `sudo gluster volume status all` (check Online Col, only `Y` must appear)
    - Check that Nomad is healthy
    - Check that Consul is healthy
    - Check that Postgres is healthy
  - Run `ansible-playbook -i production --limit <machine> site.yml`
  - Reboot
  - Check that cluster is healthy

## New configuration with Wireguard

This configuration is used to make all of the cluster nodes appear in a single
virtual private network, enable them to communicate on all ports even if they
are behind NATs at different locations. The VPN also provides a layer of
security, encrypting all comunications that occur over the internet.

### Prerequisites

Nodes must all have two publicly accessible ports (potentially routed through a NAT):

- A port that maps to the SSH port (port 22) of the machine, allowing TCP connections
- A port that maps to the Wireguard port (port 51820) of the machine, allowing UDP connections


### Configuration

The network role sets up a Wireguard interface, called `wgdeuxfleurs`, and
establishes a full mesh between all cluster machines. The following
configuration variables are necessary in the node list:

- `ansible_host`: hostname to which Ansible connects to, usually the same as `public_ip`
- `ansible_user`: username to connect as for Ansible to run commands through SSH
- `ansible_port`: if SSH is not bound publicly on port 22, set the port here
- `public_ip`: the public IP for the machine or the NATting router behind which the machine is
- `public_vpn_port`: the public port number on `public_ip` that maps to port 51820 of the machine
- `vpn_ip`: the IP address to affect to the node on the VPN (each node must have a different one)
- `dns_server`: any DNS resolver, typically your ISP's DNS or a public one such as OpenDNS

The new iptables configuration now prevents direct communication between
cluster machines, except on port 51820 which is used to transmit VPN packets.
All intra-cluster communications must now go through the VPN interface (thus
machines refer to one another using their VPN IP addresses and never their
public or LAN addresses).

### Restarting Nomad

When switching to the Wireguard configuration, machines will stop using their
LAN addresses and switch to using their VPN addresses. Consul seems to handle
this correctly, however Nomad does not. To make Nomad able to restart
correctly, its Raft protocol module must be informed of the new IP addresses of
the cluster members. This is done by creating on all nodes the file
`/var/lib/nomad/server/raft/peers.json` that contains the list of IP addresses
of the cluster. Here is an example for such a file:

```
["10.68.70.11:4647","10.68.70.12:4647","10.68.70.13:4647"]
```

Once this file is created and is the same on all nodes, restart Nomad on all
nodes. The cluster should resume operation normally.

The same procedure can also be applied to fix Consul, however my tests showed
that it didn't break when IP addresses changed (it just took a bit long to come
back up).
Add a readme 2020-07-05 17:52:31 +00:00			`# ANSIBLE`
Initial commit 2019-06-01 14:02:49 +00:00
Add a readme 2020-07-05 17:52:31 +00:00			`## How to proceed`
Initial commit 2019-06-01 14:02:49 +00:00
Add a readme 2020-07-05 17:52:31 +00:00			`For each machine, one by one do:`
			`- Check that cluster is healthy`
Clean nomad+consul deploy tasks as we do not deploy anymore on ARM so it is untested for real 2020-07-05 18:12:51 +00:00			- `sudo gluster peer status`
			- `sudo gluster volume status all` (check Online Col, only `Y` must appear)
Add docs + fix warning 2020-07-05 18:15:28 +00:00			`- Check that Nomad is healthy`
			`- Check that Consul is healthy`
			`- Check that Postgres is healthy`
Add a readme 2020-07-05 17:52:31 +00:00			- Run `ansible-playbook -i production --limit <machine> site.yml`
			`- Reboot`
			`- Check that cluster is healthy`
Initial commit 2019-06-01 14:02:49 +00:00
Document Wireguard config 2020-05-21 13:50:14 +00:00			`## New configuration with Wireguard`

			`This configuration is used to make all of the cluster nodes appear in a single`
			`virtual private network, enable them to communicate on all ports even if they`
			`are behind NATs at different locations. The VPN also provides a layer of`
			`security, encrypting all comunications that occur over the internet.`

			`### Prerequisites`

			`Nodes must all have two publicly accessible ports (potentially routed through a NAT):`

			`- A port that maps to the SSH port (port 22) of the machine, allowing TCP connections`
			`- A port that maps to the Wireguard port (port 51820) of the machine, allowing UDP connections`


			`### Configuration`

			The network role sets up a Wireguard interface, called `wgdeuxfleurs`, and
			`establishes a full mesh between all cluster machines. The following`
			`configuration variables are necessary in the node list:`

			- `ansible_host`: hostname to which Ansible connects to, usually the same as `public_ip`
			- `ansible_user`: username to connect as for Ansible to run commands through SSH
			- `ansible_port`: if SSH is not bound publicly on port 22, set the port here
			- `public_ip`: the public IP for the machine or the NATting router behind which the machine is
			- `public_vpn_port`: the public port number on `public_ip` that maps to port 51820 of the machine
			- `vpn_ip`: the IP address to affect to the node on the VPN (each node must have a different one)
			- `dns_server`: any DNS resolver, typically your ISP's DNS or a public one such as OpenDNS

			`The new iptables configuration now prevents direct communication between`
			`cluster machines, except on port 51820 which is used to transmit VPN packets.`
			`All intra-cluster communications must now go through the VPN interface (thus`
			`machines refer to one another using their VPN IP addresses and never their`
			`public or LAN addresses).`

			`### Restarting Nomad`

			`When switching to the Wireguard configuration, machines will stop using their`
			`LAN addresses and switch to using their VPN addresses. Consul seems to handle`
			`this correctly, however Nomad does not. To make Nomad able to restart`
			`correctly, its Raft protocol module must be informed of the new IP addresses of`
			`the cluster members. This is done by creating on all nodes the file`
			`/var/lib/nomad/server/raft/peers.json` that contains the list of IP addresses
			`of the cluster. Here is an example for such a file:`

			```
			`["10.68.70.11:4647","10.68.70.12:4647","10.68.70.13:4647"]`
			```

			`Once this file is created and is the same on all nodes, restart Nomad on all`
			`nodes. The cluster should resume operation normally.`

			`The same procedure can also be applied to fix Consul, however my tests showed`
			`that it didn't break when IP addresses changed (it just took a bit long to come`
			`back up).`