Merge branch 'main' into add-bespin

This commit is contained in:
Quentin 2022-10-16 12:08:49 +02:00
commit b42bf16f78
Signed by: quentin
GPG key ID: E9602264D639FF68
5 changed files with 165 additions and 147 deletions

171
README.md
View file

@ -8,159 +8,46 @@ It sets up the following:
- Consul, with TLS - Consul, with TLS
- Nomad, with TLS - Nomad, with TLS
## Configuring the OS
This repo contains a bunch of scripts to configure NixOS on all cluster nodes. ## How to welcome a new administrator
Most scripts are invoked with the following syntax:
- for scripts that generate secrets: `./gen_<something> <cluster_name>` to generate the secrets to be used on cluster `<cluster_name>` See: https://guide.deuxfleurs.fr/operations/acces/pass/
- for deployment scripts:
- `./deploy_<something> <cluster_name>` to run the deployment script on all nodes of the cluster `<cluster_name>`
- `./deploy_<something> <cluster_name> <node1> <node2> ...` to run the deployment script only on nodes `node1, node2, ...` of cluster `<cluster_name>`.
All deployment scripts can use the following parameters passed as environment variables: Basically:
- The new administrator generates a GPG key and publishes it on Gitea
- All existing administrators pull their key and sign it
- An existing administrator reencrypt the keystore with this new key and push it
- The new administrator clone the repo and check that they can decrypt the secrets
- Finally, the new administrator must choose a password to operate over SSH with `./passwd prod rick` where `rick` is the target username
- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning.
- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.
### Assumptions (how to setup your environment) ## How to create files for a new zone
- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`) *The documentation is written for the production cluster, the same apply for other clusters.*
- your account is in group `wheel` and you know its password (you need it to become root using `sudo`); Basically:
the password is the same on all cluster nodes (see below for password management tools) - Create your `site` file in `cluster/prod/site/` folder
- Create your `node` files in `cluster/prod/node/` folder
- Add your wireguard configuration to `cluster/prod/cluster.nix`
- You will have to edit your NAT config manually
- To get your node's wg public key, you must run `./deploy_prod prod <node>`, see the next section for more information
- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs` ## How to deploy a Nix configuration on a fresh node
(scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
### Deploying the NixOS configuration We suppose that the node name is `datura`.
Start by doing the deployment one node at a time, you will have plenty of time
in your operator's life to break everything through automation.
The NixOS configuration makes use of a certain number of files: Run:
- `./deploy_wg prod datura` - to generate wireguard's keys
- `./deploy_nixos prod datura` - to deploy the nix configuration files (need to be redeployed on all nodes as hte new wireguard conf is needed everywhere)
- `./deploy_password prod datura` - to deploy user's passwords
- `./deploy_pki prod datura` - to deploy Nomad's and Consul's PKI
- files in `nix/` that are the same for all deployments on all clusters ## How to operate a node
- the file `cluster/<cluster_name>/cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes
- files in `cluster/<cluster_name>/site/`, which are specific to the various sites on which Nix nodes are deployed
- files in `cluster/<cluster_name>/node/` which are specific to each node
To deploy the NixOS configuration on the cluster, simply do: *To be written*
``` ## More
./deploy_nixos <cluster_name>
```
or to deploy only on a single node:
```
./deploy_nixos <cluster_name> <node_name>
```
To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
**When adding a node to the cluster:** just do `./deploy_nixos <cluster_name> <name_of_new_node>`
### Deploying Wesher
We use Wesher to provide an encrypted overlay network between nodes in the cluster.
This is usefull in particular for securing services that are not able to do mTLS,
but as a security-in-depth measure, we make all traffic go through Wesher even when
TLS is done correctly. It is thus mandatory to have a working Wesher installation
in the cluster for it to run correctly.
First, if no Wesher shared secret key has been generated for this cluster yet,
generate it with:
```
./gen_wesher_key <cluster_name>
```
This key will be stored in `pass`, so you must have a working `pass` installation
for this script to run correctly.
Then, deploy the key on all nodes with:
```
./deploy_wesher_key <cluster_name>
```
This should be done after `./deploy_nixos` has run successfully on all nodes.
You should now have a working Wesher network between all your nodes!
**When adding a node to the cluster:** just do `./deploy_wesher_key <cluster_name> <name_of_new_node>`
### Generating and deploying a PKI for Consul and Nomad
This is very similar to how we do for Wesher.
First, if the PKI has not yet been created, create it with:
```
./gen_pki <cluster_name>
```
Then, deploy the PKI on all nodes with:
```
./deploy_pki <cluster_name>
```
**When adding a node to the cluster:** just do `./deploy_pki <cluster_name> <name_of_new_node>`
### Adding administrators and password management
Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary).
This is where their public SSH keys for remote access are put.
Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo.
To set the password for a new administrator, they must have a working `pass` installation as specified above.
They must then run:
```
./passwd <cluster_name> <user_name>
```
to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db).
Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes:
```
./deploy_passwords <cluster_name>
```
## Deploying stuff on Nomad
### Connecting to Nomad
Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost.
You can for instance use an entry in your `~/.ssh/config` that looks like this:
```
Host caribou
HostName 2a01:e0a:c:a720::23
LocalForward 14646 127.0.0.1:4646
LocalForward 8501 127.0.0.1:8501
LocalForward 1389 bottin.service.staging.consul:389
```
Then, in a separate window, launch `./tlsproxy <cluster_name>`: this will
launch `socat` proxies that strip the TLS layer and allow you to simply access
Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for
Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as
long as you need to access Nomad and Consul on the cluster.
### Launching services
Stuff should be started in this order:
1. `app/core`
2. `app/frontend`
3. `app/telemetry`
4. `app/garage-staging`
5. `app/directory`
Then, other stuff can be started in any order:
- `app/im` (cluster `staging` only)
- `app/cryptpad` (cluster `prod` only)
- `app/drone-ci`
Please read README.more.md for more detailed information

129
README.more.md Normal file
View file

@ -0,0 +1,129 @@
# Additional README
## Configuring the OS
This repo contains a bunch of scripts to configure NixOS on all cluster nodes.
Most scripts are invoked with the following syntax:
- for scripts that generate secrets: `./gen_<something> <cluster_name>` to generate the secrets to be used on cluster `<cluster_name>`
- for deployment scripts:
- `./deploy_<something> <cluster_name>` to run the deployment script on all nodes of the cluster `<cluster_name>`
- `./deploy_<something> <cluster_name> <node1> <node2> ...` to run the deployment script only on nodes `node1, node2, ...` of cluster `<cluster_name>`.
All deployment scripts can use the following parameters passed as environment variables:
- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning.
- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.
### Assumptions (how to setup your environment)
- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`)
- your account is in group `wheel` and you know its password (you need it to become root using `sudo`);
the password is the same on all cluster nodes (see below for password management tools)
- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs`
(scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
### Deploying the NixOS configuration
The NixOS configuration makes use of a certain number of files:
- files in `nix/` that are the same for all deployments on all clusters
- the file `cluster/<cluster_name>/cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes
- files in `cluster/<cluster_name>/site/`, which are specific to the various sites on which Nix nodes are deployed
- files in `cluster/<cluster_name>/node/` which are specific to each node
To deploy the NixOS configuration on the cluster, simply do:
```
./deploy_nixos <cluster_name>
```
or to deploy only on a single node:
```
./deploy_nixos <cluster_name> <node_name>
```
To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
**When adding a node to the cluster:** just do `./deploy_nixos <cluster_name> <name_of_new_node>`
### Generating and deploying a PKI for Consul and Nomad
This is very similar to how we do for Wesher.
First, if the PKI has not yet been created, create it with:
```
./gen_pki <cluster_name>
```
Then, deploy the PKI on all nodes with:
```
./deploy_pki <cluster_name>
```
**When adding a node to the cluster:** just do `./deploy_pki <cluster_name> <name_of_new_node>`
### Adding administrators and password management
Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary).
This is where their public SSH keys for remote access are put.
Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo.
To set the password for a new administrator, they must have a working `pass` installation as specified above.
They must then run:
```
./passwd <cluster_name> <user_name>
```
to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db).
Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes:
```
./deploy_passwords <cluster_name>
```
## Deploying stuff on Nomad
### Connecting to Nomad
Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost.
You can for instance use an entry in your `~/.ssh/config` that looks like this:
```
Host caribou
HostName 2a01:e0a:c:a720::23
LocalForward 14646 127.0.0.1:4646
LocalForward 8501 127.0.0.1:8501
LocalForward 1389 bottin.service.staging.consul:389
```
Then, in a separate window, launch `./tlsproxy <cluster_name>`: this will
launch `socat` proxies that strip the TLS layer and allow you to simply access
Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for
Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as
long as you need to access Nomad and Consul on the cluster.
### Launching services
Stuff should be started in this order:
1. `app/core`
2. `app/frontend`
3. `app/telemetry`
4. `app/garage-staging`
5. `app/directory`
Then, other stuff can be started in any order:
- `app/im` (cluster `staging` only)
- `app/cryptpad` (cluster `prod` only)
- `app/drone-ci`

View file

@ -90,7 +90,7 @@ EOH
} }
resources { resources {
cpu = 2000 cpu = 500
memory = 200 memory = 200
} }

View file

@ -7,8 +7,4 @@ copy cluster/$CLUSTER/cluster.nix /etc/nixos/cluster.nix
copy cluster/$CLUSTER/node/$NIXHOST.nix /etc/nixos/node.nix copy cluster/$CLUSTER/node/$NIXHOST.nix /etc/nixos/node.nix
copy cluster/$CLUSTER/node/$NIXHOST.site.nix /etc/nixos/site.nix copy cluster/$CLUSTER/node/$NIXHOST.site.nix /etc/nixos/site.nix
cmd 'mkdir -p /var/lib/deuxfleurs/wireguard-keys'
cmd 'test -f /var/lib/deuxfleurs/wireguard-keys/private || (wg genkey > /var/lib/deuxfleurs/wireguard-keys/private; chmod 600 /var/lib/deuxfleurs/wireguard-keys/private)'
cmd 'echo "Public key: $(wg pubkey < /var/lib/deuxfleurs/wireguard-keys/private)"'
cmd nixos-rebuild switch --show-trace cmd nixos-rebuild switch --show-trace

6
deploy_wg Executable file
View file

@ -0,0 +1,6 @@
#!/usr/bin/env ./sshtool
cmd 'nix-env -i wireguard'
cmd 'mkdir -p /var/lib/deuxfleurs/wireguard-keys'
cmd 'test -f /var/lib/deuxfleurs/wireguard-keys/private || (wg genkey > /var/lib/deuxfleurs/wireguard-keys/private; chmod 600 /var/lib/deuxfleurs/wireguard-keys/private)'
cmd 'echo "Public key: $(wg pubkey < /var/lib/deuxfleurs/wireguard-keys/private)"'