Merge branch 'main' into add-bespin
This commit is contained in:
commit
b42bf16f78
5 changed files with 165 additions and 147 deletions
171
README.md
171
README.md
|
@ -8,159 +8,46 @@ It sets up the following:
|
||||||
- Consul, with TLS
|
- Consul, with TLS
|
||||||
- Nomad, with TLS
|
- Nomad, with TLS
|
||||||
|
|
||||||
## Configuring the OS
|
|
||||||
|
|
||||||
This repo contains a bunch of scripts to configure NixOS on all cluster nodes.
|
## How to welcome a new administrator
|
||||||
Most scripts are invoked with the following syntax:
|
|
||||||
|
|
||||||
- for scripts that generate secrets: `./gen_<something> <cluster_name>` to generate the secrets to be used on cluster `<cluster_name>`
|
See: https://guide.deuxfleurs.fr/operations/acces/pass/
|
||||||
- for deployment scripts:
|
|
||||||
- `./deploy_<something> <cluster_name>` to run the deployment script on all nodes of the cluster `<cluster_name>`
|
|
||||||
- `./deploy_<something> <cluster_name> <node1> <node2> ...` to run the deployment script only on nodes `node1, node2, ...` of cluster `<cluster_name>`.
|
|
||||||
|
|
||||||
All deployment scripts can use the following parameters passed as environment variables:
|
Basically:
|
||||||
|
- The new administrator generates a GPG key and publishes it on Gitea
|
||||||
|
- All existing administrators pull their key and sign it
|
||||||
|
- An existing administrator reencrypt the keystore with this new key and push it
|
||||||
|
- The new administrator clone the repo and check that they can decrypt the secrets
|
||||||
|
- Finally, the new administrator must choose a password to operate over SSH with `./passwd prod rick` where `rick` is the target username
|
||||||
|
|
||||||
- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning.
|
|
||||||
- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.
|
|
||||||
|
|
||||||
### Assumptions (how to setup your environment)
|
## How to create files for a new zone
|
||||||
|
|
||||||
- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`)
|
*The documentation is written for the production cluster, the same apply for other clusters.*
|
||||||
|
|
||||||
- your account is in group `wheel` and you know its password (you need it to become root using `sudo`);
|
Basically:
|
||||||
the password is the same on all cluster nodes (see below for password management tools)
|
- Create your `site` file in `cluster/prod/site/` folder
|
||||||
|
- Create your `node` files in `cluster/prod/node/` folder
|
||||||
|
- Add your wireguard configuration to `cluster/prod/cluster.nix`
|
||||||
|
- You will have to edit your NAT config manually
|
||||||
|
- To get your node's wg public key, you must run `./deploy_prod prod <node>`, see the next section for more information
|
||||||
|
|
||||||
- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs`
|
## How to deploy a Nix configuration on a fresh node
|
||||||
(scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
|
|
||||||
|
|
||||||
### Deploying the NixOS configuration
|
We suppose that the node name is `datura`.
|
||||||
|
Start by doing the deployment one node at a time, you will have plenty of time
|
||||||
|
in your operator's life to break everything through automation.
|
||||||
|
|
||||||
The NixOS configuration makes use of a certain number of files:
|
Run:
|
||||||
|
- `./deploy_wg prod datura` - to generate wireguard's keys
|
||||||
|
- `./deploy_nixos prod datura` - to deploy the nix configuration files (need to be redeployed on all nodes as hte new wireguard conf is needed everywhere)
|
||||||
|
- `./deploy_password prod datura` - to deploy user's passwords
|
||||||
|
- `./deploy_pki prod datura` - to deploy Nomad's and Consul's PKI
|
||||||
|
|
||||||
- files in `nix/` that are the same for all deployments on all clusters
|
## How to operate a node
|
||||||
- the file `cluster/<cluster_name>/cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes
|
|
||||||
- files in `cluster/<cluster_name>/site/`, which are specific to the various sites on which Nix nodes are deployed
|
|
||||||
- files in `cluster/<cluster_name>/node/` which are specific to each node
|
|
||||||
|
|
||||||
To deploy the NixOS configuration on the cluster, simply do:
|
*To be written*
|
||||||
|
|
||||||
```
|
## More
|
||||||
./deploy_nixos <cluster_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
or to deploy only on a single node:
|
|
||||||
|
|
||||||
```
|
|
||||||
./deploy_nixos <cluster_name> <node_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
|
|
||||||
|
|
||||||
**When adding a node to the cluster:** just do `./deploy_nixos <cluster_name> <name_of_new_node>`
|
|
||||||
|
|
||||||
### Deploying Wesher
|
|
||||||
|
|
||||||
We use Wesher to provide an encrypted overlay network between nodes in the cluster.
|
|
||||||
This is usefull in particular for securing services that are not able to do mTLS,
|
|
||||||
but as a security-in-depth measure, we make all traffic go through Wesher even when
|
|
||||||
TLS is done correctly. It is thus mandatory to have a working Wesher installation
|
|
||||||
in the cluster for it to run correctly.
|
|
||||||
|
|
||||||
First, if no Wesher shared secret key has been generated for this cluster yet,
|
|
||||||
generate it with:
|
|
||||||
|
|
||||||
```
|
|
||||||
./gen_wesher_key <cluster_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
This key will be stored in `pass`, so you must have a working `pass` installation
|
|
||||||
for this script to run correctly.
|
|
||||||
|
|
||||||
Then, deploy the key on all nodes with:
|
|
||||||
|
|
||||||
```
|
|
||||||
./deploy_wesher_key <cluster_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
This should be done after `./deploy_nixos` has run successfully on all nodes.
|
|
||||||
You should now have a working Wesher network between all your nodes!
|
|
||||||
|
|
||||||
**When adding a node to the cluster:** just do `./deploy_wesher_key <cluster_name> <name_of_new_node>`
|
|
||||||
|
|
||||||
### Generating and deploying a PKI for Consul and Nomad
|
|
||||||
|
|
||||||
This is very similar to how we do for Wesher.
|
|
||||||
|
|
||||||
First, if the PKI has not yet been created, create it with:
|
|
||||||
|
|
||||||
```
|
|
||||||
./gen_pki <cluster_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
Then, deploy the PKI on all nodes with:
|
|
||||||
|
|
||||||
```
|
|
||||||
./deploy_pki <cluster_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
**When adding a node to the cluster:** just do `./deploy_pki <cluster_name> <name_of_new_node>`
|
|
||||||
|
|
||||||
### Adding administrators and password management
|
|
||||||
|
|
||||||
Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary).
|
|
||||||
This is where their public SSH keys for remote access are put.
|
|
||||||
|
|
||||||
Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo.
|
|
||||||
To set the password for a new administrator, they must have a working `pass` installation as specified above.
|
|
||||||
They must then run:
|
|
||||||
|
|
||||||
```
|
|
||||||
./passwd <cluster_name> <user_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db).
|
|
||||||
|
|
||||||
Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes:
|
|
||||||
|
|
||||||
```
|
|
||||||
./deploy_passwords <cluster_name>
|
|
||||||
```
|
|
||||||
|
|
||||||
## Deploying stuff on Nomad
|
|
||||||
|
|
||||||
### Connecting to Nomad
|
|
||||||
|
|
||||||
Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost.
|
|
||||||
|
|
||||||
You can for instance use an entry in your `~/.ssh/config` that looks like this:
|
|
||||||
|
|
||||||
```
|
|
||||||
Host caribou
|
|
||||||
HostName 2a01:e0a:c:a720::23
|
|
||||||
LocalForward 14646 127.0.0.1:4646
|
|
||||||
LocalForward 8501 127.0.0.1:8501
|
|
||||||
LocalForward 1389 bottin.service.staging.consul:389
|
|
||||||
```
|
|
||||||
|
|
||||||
Then, in a separate window, launch `./tlsproxy <cluster_name>`: this will
|
|
||||||
launch `socat` proxies that strip the TLS layer and allow you to simply access
|
|
||||||
Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for
|
|
||||||
Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as
|
|
||||||
long as you need to access Nomad and Consul on the cluster.
|
|
||||||
|
|
||||||
### Launching services
|
|
||||||
|
|
||||||
Stuff should be started in this order:
|
|
||||||
|
|
||||||
1. `app/core`
|
|
||||||
2. `app/frontend`
|
|
||||||
3. `app/telemetry`
|
|
||||||
4. `app/garage-staging`
|
|
||||||
5. `app/directory`
|
|
||||||
|
|
||||||
Then, other stuff can be started in any order:
|
|
||||||
|
|
||||||
- `app/im` (cluster `staging` only)
|
|
||||||
- `app/cryptpad` (cluster `prod` only)
|
|
||||||
- `app/drone-ci`
|
|
||||||
|
|
||||||
|
Please read README.more.md for more detailed information
|
||||||
|
|
129
README.more.md
Normal file
129
README.more.md
Normal file
|
@ -0,0 +1,129 @@
|
||||||
|
# Additional README
|
||||||
|
|
||||||
|
## Configuring the OS
|
||||||
|
|
||||||
|
This repo contains a bunch of scripts to configure NixOS on all cluster nodes.
|
||||||
|
Most scripts are invoked with the following syntax:
|
||||||
|
|
||||||
|
- for scripts that generate secrets: `./gen_<something> <cluster_name>` to generate the secrets to be used on cluster `<cluster_name>`
|
||||||
|
- for deployment scripts:
|
||||||
|
- `./deploy_<something> <cluster_name>` to run the deployment script on all nodes of the cluster `<cluster_name>`
|
||||||
|
- `./deploy_<something> <cluster_name> <node1> <node2> ...` to run the deployment script only on nodes `node1, node2, ...` of cluster `<cluster_name>`.
|
||||||
|
|
||||||
|
All deployment scripts can use the following parameters passed as environment variables:
|
||||||
|
|
||||||
|
- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning.
|
||||||
|
- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.
|
||||||
|
|
||||||
|
### Assumptions (how to setup your environment)
|
||||||
|
|
||||||
|
- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`)
|
||||||
|
|
||||||
|
- your account is in group `wheel` and you know its password (you need it to become root using `sudo`);
|
||||||
|
the password is the same on all cluster nodes (see below for password management tools)
|
||||||
|
|
||||||
|
- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs`
|
||||||
|
(scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
|
||||||
|
|
||||||
|
### Deploying the NixOS configuration
|
||||||
|
|
||||||
|
The NixOS configuration makes use of a certain number of files:
|
||||||
|
|
||||||
|
- files in `nix/` that are the same for all deployments on all clusters
|
||||||
|
- the file `cluster/<cluster_name>/cluster.nix`, a Nix configuration file that is specific to the cluster but is copied the same on all cluster nodes
|
||||||
|
- files in `cluster/<cluster_name>/site/`, which are specific to the various sites on which Nix nodes are deployed
|
||||||
|
- files in `cluster/<cluster_name>/node/` which are specific to each node
|
||||||
|
|
||||||
|
To deploy the NixOS configuration on the cluster, simply do:
|
||||||
|
|
||||||
|
```
|
||||||
|
./deploy_nixos <cluster_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
or to deploy only on a single node:
|
||||||
|
|
||||||
|
```
|
||||||
|
./deploy_nixos <cluster_name> <node_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
|
||||||
|
|
||||||
|
**When adding a node to the cluster:** just do `./deploy_nixos <cluster_name> <name_of_new_node>`
|
||||||
|
|
||||||
|
### Generating and deploying a PKI for Consul and Nomad
|
||||||
|
|
||||||
|
This is very similar to how we do for Wesher.
|
||||||
|
|
||||||
|
First, if the PKI has not yet been created, create it with:
|
||||||
|
|
||||||
|
```
|
||||||
|
./gen_pki <cluster_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, deploy the PKI on all nodes with:
|
||||||
|
|
||||||
|
```
|
||||||
|
./deploy_pki <cluster_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
**When adding a node to the cluster:** just do `./deploy_pki <cluster_name> <name_of_new_node>`
|
||||||
|
|
||||||
|
### Adding administrators and password management
|
||||||
|
|
||||||
|
Adminstrators are defined in the `cluster.nix` file for each cluster (they could also be defined in the site-specific Nix files if necessary).
|
||||||
|
This is where their public SSH keys for remote access are put.
|
||||||
|
|
||||||
|
Administrators will also need passwords to administrate the cluster, as we are not using passwordless sudo.
|
||||||
|
To set the password for a new administrator, they must have a working `pass` installation as specified above.
|
||||||
|
They must then run:
|
||||||
|
|
||||||
|
```
|
||||||
|
./passwd <cluster_name> <user_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
to set their password in the `pass` database (the password is hashed, so other administrators cannot learn their password even if they have access to the `pass` db).
|
||||||
|
|
||||||
|
Then, an administrator that already has root access must run the following (after syncing the `pass` db) to set the password correctly on all cluster nodes:
|
||||||
|
|
||||||
|
```
|
||||||
|
./deploy_passwords <cluster_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deploying stuff on Nomad
|
||||||
|
|
||||||
|
### Connecting to Nomad
|
||||||
|
|
||||||
|
Connect using SSH to one of the cluster nodes, forwarding port 14646 to port 4646 on localhost, and port 8501 to port 8501 on localhost.
|
||||||
|
|
||||||
|
You can for instance use an entry in your `~/.ssh/config` that looks like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
Host caribou
|
||||||
|
HostName 2a01:e0a:c:a720::23
|
||||||
|
LocalForward 14646 127.0.0.1:4646
|
||||||
|
LocalForward 8501 127.0.0.1:8501
|
||||||
|
LocalForward 1389 bottin.service.staging.consul:389
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, in a separate window, launch `./tlsproxy <cluster_name>`: this will
|
||||||
|
launch `socat` proxies that strip the TLS layer and allow you to simply access
|
||||||
|
Nomad and Consul on the regular, unencrypted URLs: `http://localhost:4646` for
|
||||||
|
Nomad and `http://localhost:8500` for Consul. Keep this terminal window for as
|
||||||
|
long as you need to access Nomad and Consul on the cluster.
|
||||||
|
|
||||||
|
### Launching services
|
||||||
|
|
||||||
|
Stuff should be started in this order:
|
||||||
|
|
||||||
|
1. `app/core`
|
||||||
|
2. `app/frontend`
|
||||||
|
3. `app/telemetry`
|
||||||
|
4. `app/garage-staging`
|
||||||
|
5. `app/directory`
|
||||||
|
|
||||||
|
Then, other stuff can be started in any order:
|
||||||
|
|
||||||
|
- `app/im` (cluster `staging` only)
|
||||||
|
- `app/cryptpad` (cluster `prod` only)
|
||||||
|
- `app/drone-ci`
|
||||||
|
|
|
@ -90,7 +90,7 @@ EOH
|
||||||
}
|
}
|
||||||
|
|
||||||
resources {
|
resources {
|
||||||
cpu = 2000
|
cpu = 500
|
||||||
memory = 200
|
memory = 200
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -7,8 +7,4 @@ copy cluster/$CLUSTER/cluster.nix /etc/nixos/cluster.nix
|
||||||
copy cluster/$CLUSTER/node/$NIXHOST.nix /etc/nixos/node.nix
|
copy cluster/$CLUSTER/node/$NIXHOST.nix /etc/nixos/node.nix
|
||||||
copy cluster/$CLUSTER/node/$NIXHOST.site.nix /etc/nixos/site.nix
|
copy cluster/$CLUSTER/node/$NIXHOST.site.nix /etc/nixos/site.nix
|
||||||
|
|
||||||
cmd 'mkdir -p /var/lib/deuxfleurs/wireguard-keys'
|
|
||||||
cmd 'test -f /var/lib/deuxfleurs/wireguard-keys/private || (wg genkey > /var/lib/deuxfleurs/wireguard-keys/private; chmod 600 /var/lib/deuxfleurs/wireguard-keys/private)'
|
|
||||||
cmd 'echo "Public key: $(wg pubkey < /var/lib/deuxfleurs/wireguard-keys/private)"'
|
|
||||||
|
|
||||||
cmd nixos-rebuild switch --show-trace
|
cmd nixos-rebuild switch --show-trace
|
||||||
|
|
6
deploy_wg
Executable file
6
deploy_wg
Executable file
|
@ -0,0 +1,6 @@
|
||||||
|
#!/usr/bin/env ./sshtool
|
||||||
|
|
||||||
|
cmd 'nix-env -i wireguard'
|
||||||
|
cmd 'mkdir -p /var/lib/deuxfleurs/wireguard-keys'
|
||||||
|
cmd 'test -f /var/lib/deuxfleurs/wireguard-keys/private || (wg genkey > /var/lib/deuxfleurs/wireguard-keys/private; chmod 600 /var/lib/deuxfleurs/wireguard-keys/private)'
|
||||||
|
cmd 'echo "Public key: $(wg pubkey < /var/lib/deuxfleurs/wireguard-keys/private)"'
|
Loading…
Reference in a new issue