124 lines
4.5 KiB
Markdown
124 lines
4.5 KiB
Markdown
# Deuxfleurs on NixOS!
|
|
|
|
This repository contains code to run Deuxfleur's infrastructure on NixOS.
|
|
|
|
It sets up the following:
|
|
|
|
- A Wireguard mesh between all nodes
|
|
- Consul, with TLS
|
|
- Nomad, with TLS
|
|
|
|
|
|
## How to welcome a new administrator
|
|
|
|
See: https://guide.deuxfleurs.fr/operations/acces/pass/
|
|
|
|
Basically:
|
|
- The new administrator generates a GPG key and publishes it on Gitea
|
|
- All existing administrators pull their key and sign it
|
|
- An existing administrator reencrypt the keystore with this new key and push it
|
|
- The new administrator clone the repo and check that they can decrypt the secrets
|
|
- Finally, the new administrator must choose a password to operate over SSH with `./passwd prod rick` where `rick` is the target username
|
|
|
|
|
|
## How to create files for a new zone
|
|
|
|
*The documentation is written for the production cluster, the same apply for other clusters.*
|
|
|
|
Basically:
|
|
- Create your `site` file in `cluster/prod/site/` folder
|
|
- Create your `node` files in `cluster/prod/node/` folder
|
|
- Add your wireguard configuration to `cluster/prod/cluster.nix`
|
|
- You will have to edit your NAT config manually
|
|
- To get your node's wg public key, you must run `./deploy_prod prod <node>`, see the next section for more information
|
|
- Add your nodes to `cluster/prod/ssh_config`, it will be used by the various SSH scripts.
|
|
- If you use `ssh` directly, use `ssh -F ./cluster/prod/ssh_config`
|
|
- Add `User root` for the first time as your user will not be declared yet on the system
|
|
|
|
## How to deploy a Nix configuration on a fresh node
|
|
|
|
We suppose that the node name is `datura`.
|
|
Start by doing the deployment one node at a time, you will have plenty of time
|
|
in your operator's life to break everything through automation.
|
|
|
|
Run:
|
|
- `./deploy_wg prod datura` - to generate wireguard's keys
|
|
- `./deploy_nixos prod datura` - to deploy the nix configuration files
|
|
- need to be redeployed on all nodes as the new wireguard conf is needed everywhere
|
|
- `./deploy_password prod datura` - to deploy user's passwords
|
|
- need to be redeployed on all nodes to setup the password on all nodes
|
|
- `./deploy_pki prod datura` - to deploy Nomad's and Consul's PKI
|
|
|
|
## How to operate a node
|
|
|
|
Edit your `~/.ssh/config` file:
|
|
|
|
```
|
|
Host dahlia
|
|
HostName dahlia.machine.deuxfleurs.fr
|
|
LocalForward 14646 127.0.0.1:4646
|
|
LocalForward 8501 127.0.0.1:8501
|
|
LocalForward 1389 bottin.service.prod.consul:389
|
|
LocalForward 5432 psql-proxy.service.prod.consul:5432
|
|
```
|
|
|
|
Then run the TLS proxy and leave it running:
|
|
|
|
```
|
|
./tlsproxy prod
|
|
```
|
|
|
|
SSH to a production machine (e.g. dahlia) and leave it running:
|
|
|
|
```
|
|
ssh dahlia
|
|
```
|
|
|
|
|
|
Finally you should see be able to access the production Nomad and Consul by browsing:
|
|
|
|
- Consul: http://localhost:8500
|
|
- Nomad: http://localhost:4646
|
|
|
|
|
|
## Why not Ansible?
|
|
|
|
I often get asked why not use Ansible to deploy to remote machines, as this
|
|
would look like a typical use case. There are many reasons, which basically
|
|
boil down to "I really don't like Ansible":
|
|
|
|
- Ansible tries to do declarative system configuration, but doesn't do it
|
|
correctly at all, like Nix does. Example: in NixOS, to undo something you've
|
|
done, just comment the corresponding lines and redeploy.
|
|
|
|
- Ansible is massive overkill for what we're trying to do here, we're just
|
|
copying a few small files and running some basic commands, leaving the rest
|
|
to NixOS.
|
|
|
|
- YAML is a pain to manipulate as soon as you have more than two or three
|
|
indentation levels. Also, why in hell would you want to write loops and
|
|
conditions in YAML when you could use a proper expression language?
|
|
|
|
- Ansible's vocabulary is not ours, and it imposes a rigid hierarchy of
|
|
directories and files which I don't want.
|
|
|
|
- Ansible is probably not flexible enough to do what we want, at least not
|
|
without getting a migraine when trying. For example, it's inventory
|
|
management is too simple to account for the heterogeneity of our cluster
|
|
nodes while still retaining a level of organization (some configuration
|
|
options are defined cluster-wide, some are defined for each site - physical
|
|
location - we deploy on, and some are specific to each node).
|
|
|
|
- I never remember Ansible's command line flags.
|
|
|
|
- My distribution's package for Ansible takes almost 400MB once installed,
|
|
WTF??? By not depending on it, we're reducing the set of tools we need to
|
|
deploy to a bare minimum: Git, OpenSSH, OpenSSL, socat,
|
|
[pass](https://www.passwordstore.org/) (and the Consul and Nomad binaries
|
|
which are, I'll admit, not small).
|
|
|
|
|
|
## More
|
|
|
|
Please read README.more.md for more detailed information
|
|
|