Alex Auvolat
d3ada90d83
Remove the networ_interface parameter in nomad config This means that nomad will now autodetect its own ip address by looking at the default route. Thus nodes in a LAN behind a NAT will get their LAN address, and internet nodes will get their public address. They won't get their VPN addresses. This seems not to break Consul's use of VPN addresses to address services, and fixes attr.unique.network.ip-address for DiploNAT. |
||
---|---|---|
.. | ||
group_vars/all | ||
roles | ||
cluster_nodes.yml | ||
lxvm | ||
production | ||
README.md | ||
README.more.md | ||
site.yml |
ANSIBLE
How to proceed
For each machine, one by one do:
- Check that cluster is healthy
sudo gluster peer status
sudo gluster volume status all
(check Online Col, onlyY
must appear)- Check that Nomad is healthy
- Check that Consul is healthy
- Check that Postgres is healthy
- Run
ansible-playbook -i production --limit <machine> site.yml
- Reboot
- Check that cluster is healthy
New configuration with Wireguard
This configuration is used to make all of the cluster nodes appear in a single virtual private network, enable them to communicate on all ports even if they are behind NATs at different locations. The VPN also provides a layer of security, encrypting all comunications that occur over the internet.
Prerequisites
Nodes must all have two publicly accessible ports (potentially routed through a NAT):
- A port that maps to the SSH port (port 22) of the machine, allowing TCP connections
- A port that maps to the Wireguard port (port 51820) of the machine, allowing UDP connections
Configuration
The network role sets up a Wireguard interface, called wgdeuxfleurs
, and
establishes a full mesh between all cluster machines. The following
configuration variables are necessary in the node list:
ansible_host
: hostname to which Ansible connects to, usually the same aspublic_ip
ansible_user
: username to connect as for Ansible to run commands through SSHansible_port
: if SSH is not bound publicly on port 22, set the port herepublic_ip
: the public IP for the machine or the NATting router behind which the machine ispublic_vpn_port
: the public port number onpublic_ip
that maps to port 51820 of the machinevpn_ip
: the IP address to affect to the node on the VPN (each node must have a different one)dns_server
: any DNS resolver, typically your ISP's DNS or a public one such as OpenDNS
The new iptables configuration now prevents direct communication between cluster machines, except on port 51820 which is used to transmit VPN packets. All intra-cluster communications must now go through the VPN interface (thus machines refer to one another using their VPN IP addresses and never their public or LAN addresses).
Restarting Nomad
When switching to the Wireguard configuration, machines will stop using their
LAN addresses and switch to using their VPN addresses. Consul seems to handle
this correctly, however Nomad does not. To make Nomad able to restart
correctly, its Raft protocol module must be informed of the new IP addresses of
the cluster members. This is done by creating on all nodes the file
/var/lib/nomad/server/raft/peers.json
that contains the list of IP addresses
of the cluster. Here is an example for such a file:
["10.68.70.11:4647","10.68.70.12:4647","10.68.70.13:4647"]
Once this file is created and is the same on all nodes, restart Nomad on all nodes. The cluster should resume operation normally.
The same procedure can also be applied to fix Consul, however my tests showed that it didn't break when IP addresses changed (it just took a bit long to come back up).