Nix system configuration for Deuxfleurs clusters

Find a file

Quentin Dufour e8cdd6864a Split garage deployments in 2 categories - The ones that will receive some traffic from tricot - The ones "only for storage" that will not receive traffic from tricot		2022-10-08 22:23:19 +02:00
cluster	Split garage deployments in 2 categories	2022-10-08 22:23:19 +02:00
doc	Update telemetry to ES 8.2.0 and simplify config a bit	2022-05-04 16:27:46 +02:00
experimental	SSB experiment	2022-09-21 19:29:08 +02:00
nix	Force Garage to use ipv6 connectivity	2022-09-15 11:57:24 +02:00
secretmgr	Clone core module in staging and prod, move bad stuff to experimental	2022-08-24 15:48:18 +02:00
.gitignore	Modularize and prepare to support multiple clusters	2022-02-09 12:09:49 +01:00
deploy_nixos	Remove wesher, reconfigure staging without it	2022-08-23 23:55:15 +02:00
deploy_passwords	Add scripts to manage passwords	2022-04-20 15:41:54 +02:00
deploy_pki	Reconfigure services to use correct tricot url, TLS fails	2022-08-24 17:31:08 +02:00
gen_pki	Fix access to consul for non-server nodes	2022-08-24 16:58:50 +02:00
passwd	Fix passwd script	2022-05-04 16:41:07 +02:00
README.md	Update README; DNS on prod	2022-06-01 15:27:11 +02:00
restic-summary	Move cryptpad backup job to backup-daily.hcl	2022-09-26 13:02:38 +02:00
ssh_known_hosts	Change ipv6 tunnel server	2022-09-09 17:23:23 +02:00
sshtool	Don't make diplotaxis and doradille raft servers, fix sshtool	2022-08-24 14:29:56 +02:00
tlsproxy	Add postgres + WIP plume + fix diplonat	2022-08-24 19:54:15 +02:00
upgrade_nixos	Update to nixos 22.05	2022-07-27 11:18:23 +02:00

README.md

Deuxfleurs on NixOS!

This repository contains code to run Deuxfleur's infrastructure on NixOS.

It sets up the following:

A Wireguard mesh between all nodes
Consul, with TLS
Nomad, with TLS

Configuring the OS

This repo contains a bunch of scripts to configure NixOS on all cluster nodes. Most scripts are invoked with the following syntax:

for scripts that generate secrets: ./gen_<something> <cluster_name> to generate the secrets to be used on cluster <cluster_name>
for deployment scripts:
- ./deploy_<something> <cluster_name> to run the deployment script on all nodes of the cluster <cluster_name>
- ./deploy_<something> <cluster_name> <node1> <node2> ... to run the deployment script only on nodes node1, node2, ... of cluster <cluster_name>.

All deployment scripts can use the following parameters passed as environment variables:

SUDO_PASS: optionnally, the password for sudo on cluster nodes. If not set, it will be asked at the begninning.
SSH_USER: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.

Assumptions (how to setup your environment)

you have an SSH access to all of your cluster nodes (listed in cluster/<cluster_name>/ssh_config)
your account is in group wheel and you know its password (you need it to become root using sudo); the password is the same on all cluster nodes (see below for password management tools)
you have a clone of the secrets repository in your pass password store, for instance at ~/.password-store/deuxfleurs (scripts in this repo will read and write all secrets in pass under deuxfleurs/cluster/<cluster_name>/)

Deploying the NixOS configuration

The NixOS configuration makes use of a certain number of files:

files in nix/ that are the same for all deployments on all clusters
the file cluster/<cluster_name>/cluster.nix, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes
files in cluster/<cluster_name>/site/, which are specific to the various sites on which Nix nodes are deployed
files in cluster/<cluster_name>/node/ which are specific to each node

To deploy the NixOS configuration on the cluster, simply do:

./deploy_nixos <cluster_name>

or to deploy only on a single node:

./deploy_nixos <cluster_name> <node_name>

To upgrade NixOS, use the ./upgrade_nixos script instead (it has the same syntax).

When adding a node to the cluster: just do ./deploy_nixos <cluster_name> <name_of_new_node>

Deploying Wesher

We use Wesher to provide an encrypted overlay network between nodes in the cluster. This is usefull in particular for securing services that are not able to do mTLS, but as a security-in-depth measure, we make all traffic go through Wesher even when TLS is done correctly. It is thus mandatory to have a working Wesher installation in the cluster for it to run correctly.

First, if no Wesher shared secret key has been generated for this cluster yet, generate it with:

./gen_wesher_key <cluster_name>

This key will be stored in pass, so you must have a working pass installation for this script to run correctly.

Then, deploy the key on all nodes with:

./deploy_wesher_key <cluster_name>

This should be done after ./deploy_nixos has run successfully on all nodes. You should now have a working Wesher network between all your nodes!

When adding a node to the cluster: just do ./deploy_wesher_key <cluster_name> <name_of_new_node>

Generating and deploying a PKI for Consul and Nomad

This is very similar to how we do for Wesher.

First, if the PKI has not yet been created, create it with:

./gen_pki <cluster_name>

Then, deploy the PKI on all nodes with:

./deploy_pki <cluster_name>

When adding a node to the cluster: just do ./deploy_pki <cluster_name> <name_of_new_node>

Adding administrators and password management

Adminstrators are defined in the cluster.nix file for each cluster (they could also be defined in the site-specific Nix files if necessary). This is where their public SSH keys for remote access are put.

Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo. To set the password for a new administrator, they must have a working pass installation as specified above. They must then run:

./passwd <cluster_name> <user_name>

to set their password in the pass database (the password is hashed, so other administrators cannot learn their password even if they have access to the pass db).

Then, an administrator that already has root access must run the following (after syncing the pass db) to set the password correctly on all cluster nodes:

./deploy_passwords <cluster_name>

Deploying stuff on Nomad

Connecting to Nomad

Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost.

You can for instance use an entry in your ~/.ssh/config that looks like this:

Host caribou
	HostName 2a01:e0a:c:a720::23
	LocalForward 14646 127.0.0.1:4646
	LocalForward 8501 127.0.0.1:8501
	LocalForward 1389 bottin.service.staging.consul:389

Then, in a separate window, launch ./tlsproxy <cluster_name>: this will launch socat proxies that strip the TLS layer and allow you to simply access Nomad and Consul on the regular, unencrypted URLs: http://localhost:4646 for Nomad and http://localhost:8500 for Consul. Keep this terminal window for as long as you need to access Nomad and Consul on the cluster.

Launching services

Stuff should be started in this order:

app/core
app/frontend
app/telemetry
app/garage-staging
app/directory

Then, other stuff can be started in any order:

app/im (cluster staging only)
app/cryptpad (cluster prod only)
app/drone-ci