diff --git a/ansible/README.md b/ansible/README.md index db8d960..671c8f1 100644 --- a/ansible/README.md +++ b/ansible/README.md @@ -13,3 +13,59 @@ For each machine, **one by one** do: - Reboot - Check that cluster is healthy +## New configuration with Wireguard + +This configuration is used to make all of the cluster nodes appear in a single +virtual private network, enable them to communicate on all ports even if they +are behind NATs at different locations. The VPN also provides a layer of +security, encrypting all comunications that occur over the internet. + +### Prerequisites + +Nodes must all have two publicly accessible ports (potentially routed through a NAT): + +- A port that maps to the SSH port (port 22) of the machine, allowing TCP connections +- A port that maps to the Wireguard port (port 51820) of the machine, allowing UDP connections + + +### Configuration + +The network role sets up a Wireguard interface, called `wgdeuxfleurs`, and +establishes a full mesh between all cluster machines. The following +configuration variables are necessary in the node list: + +- `ansible_host`: hostname to which Ansible connects to, usually the same as `public_ip` +- `ansible_user`: username to connect as for Ansible to run commands through SSH +- `ansible_port`: if SSH is not bound publicly on port 22, set the port here +- `public_ip`: the public IP for the machine or the NATting router behind which the machine is +- `public_vpn_port`: the public port number on `public_ip` that maps to port 51820 of the machine +- `vpn_ip`: the IP address to affect to the node on the VPN (each node must have a different one) +- `dns_server`: any DNS resolver, typically your ISP's DNS or a public one such as OpenDNS + +The new iptables configuration now prevents direct communication between +cluster machines, except on port 51820 which is used to transmit VPN packets. +All intra-cluster communications must now go through the VPN interface (thus +machines refer to one another using their VPN IP addresses and never their +public or LAN addresses). + +### Restarting Nomad + +When switching to the Wireguard configuration, machines will stop using their +LAN addresses and switch to using their VPN addresses. Consul seems to handle +this correctly, however Nomad does not. To make Nomad able to restart +correctly, its Raft protocol module must be informed of the new IP addresses of +the cluster members. This is done by creating on all nodes the file +`/var/lib/nomad/server/raft/peers.json` that contains the list of IP addresses +of the cluster. Here is an example for such a file: + +``` +["10.68.70.11:4647","10.68.70.12:4647","10.68.70.13:4647"] +``` + +Once this file is created and is the same on all nodes, restart Nomad on all +nodes. The cluster should resume operation normally. + +The same procedure can also be applied to fix Consul, however my tests showed +that it didn't break when IP addresses changed (it just took a bit long to come +back up). +