forked from Deuxfleurs/nixcfg
More doc reorganization
This commit is contained in:
parent
3e5e2d60cd
commit
0e1574a82b
5 changed files with 96 additions and 95 deletions
47
README.md
47
README.md
|
@ -12,54 +12,15 @@ It sets up the following:
|
||||||
|
|
||||||
See the following documentation topics:
|
See the following documentation topics:
|
||||||
|
|
||||||
- [Quick start for adding new nodes after NixOS install](doc/quick-start.md)
|
- [Quick start and onboarding for new administrators](doc/onboarding.md)
|
||||||
|
- [How to add new nodes to a cluster (rapid overview)](doc/adding-nodes.md)
|
||||||
- [Architecture of this repo, how the scripts work](doc/architecture.md)
|
- [Architecture of this repo, how the scripts work](doc/architecture.md)
|
||||||
- [List of TCP and UDP ports used by services](doc/ports)
|
- [List of TCP and UDP ports used by services](doc/ports)
|
||||||
|
|
||||||
Additionnal documentation topics:
|
Additionnal documentation topics:
|
||||||
|
|
||||||
- [Succint guide for NixOS installation with LUKX full disk encryption](doc/nixos-install.md) (we don't do that in practice on our servers)
|
- [Succint guide for NixOS installation with LUKX full disk encryption](doc/nixos-install-luks.md) (we don't do that in practice on our servers)
|
||||||
- [Example `hardware-config.nix` for a full disk encryption scenario](doc/example-hardware-configuration.nix)
|
- [Example `hardware-config.nix` for a full disk encryption scenario](doc/example-hardware-configuration.nix)
|
||||||
|
- [Why not Ansible?](doc/why-not-ansible.md)
|
||||||
|
|
||||||
|
|
||||||
## Why not Ansible?
|
|
||||||
|
|
||||||
I often get asked why not use Ansible to deploy to remote machines, as this
|
|
||||||
would look like a typical use case. There are many reasons, which basically
|
|
||||||
boil down to "I really don't like Ansible":
|
|
||||||
|
|
||||||
- Ansible tries to do declarative system configuration, but doesn't do it
|
|
||||||
correctly at all, like Nix does. Example: in NixOS, to undo something you've
|
|
||||||
done, just comment the corresponding lines and redeploy.
|
|
||||||
|
|
||||||
- Ansible is massive overkill for what we're trying to do here, we're just
|
|
||||||
copying a few small files and running some basic commands, leaving the rest
|
|
||||||
to NixOS.
|
|
||||||
|
|
||||||
- YAML is a pain to manipulate as soon as you have more than two or three
|
|
||||||
indentation levels. Also, why in hell would you want to write loops and
|
|
||||||
conditions in YAML when you could use a proper expression language?
|
|
||||||
|
|
||||||
- Ansible's vocabulary is not ours, and it imposes a rigid hierarchy of
|
|
||||||
directories and files which I don't want.
|
|
||||||
|
|
||||||
- Ansible is probably not flexible enough to do what we want, at least not
|
|
||||||
without getting a migraine when trying. For example, it's inventory
|
|
||||||
management is too simple to account for the heterogeneity of our cluster
|
|
||||||
nodes while still retaining a level of organization (some configuration
|
|
||||||
options are defined cluster-wide, some are defined for each site - physical
|
|
||||||
location - we deploy on, and some are specific to each node).
|
|
||||||
|
|
||||||
- I never remember Ansible's command line flags.
|
|
||||||
|
|
||||||
- My distribution's package for Ansible takes almost 400MB once installed,
|
|
||||||
WTF??? By not depending on it, we're reducing the set of tools we need to
|
|
||||||
deploy to a bare minimum: Git, OpenSSH, OpenSSL, socat,
|
|
||||||
[pass](https://www.passwordstore.org/) (and the Consul and Nomad binaries
|
|
||||||
which are, I'll admit, not small).
|
|
||||||
|
|
||||||
|
|
||||||
## More
|
|
||||||
|
|
||||||
Please read README.more.md for more detailed information
|
|
||||||
|
|
||||||
|
|
|
@ -1,17 +1,5 @@
|
||||||
# Quick start
|
# Quick start
|
||||||
|
|
||||||
## How to welcome a new administrator
|
|
||||||
|
|
||||||
See: https://guide.deuxfleurs.fr/operations/acces/pass/
|
|
||||||
|
|
||||||
Basically:
|
|
||||||
- The new administrator generates a GPG key and publishes it on Gitea
|
|
||||||
- All existing administrators pull their key and sign it
|
|
||||||
- An existing administrator reencrypt the keystore with this new key and push it
|
|
||||||
- The new administrator clone the repo and check that they can decrypt the secrets
|
|
||||||
- Finally, the new administrator must choose a password to operate over SSH with `./passwd prod rick` where `rick` is the target username
|
|
||||||
|
|
||||||
|
|
||||||
## How to create files for a new zone
|
## How to create files for a new zone
|
||||||
|
|
||||||
*The documentation is written for the production cluster, the same apply for other clusters.*
|
*The documentation is written for the production cluster, the same apply for other clusters.*
|
||||||
|
@ -40,34 +28,3 @@ Run:
|
||||||
- if a user changes their password (using `./passwd`), needs to be redeployed on all nodes to setup the password on all nodes
|
- if a user changes their password (using `./passwd`), needs to be redeployed on all nodes to setup the password on all nodes
|
||||||
- `./deploy_pki prod datura` - to deploy Nomad's and Consul's PKI
|
- `./deploy_pki prod datura` - to deploy Nomad's and Consul's PKI
|
||||||
|
|
||||||
## How to operate a node
|
|
||||||
|
|
||||||
Edit your `~/.ssh/config` file:
|
|
||||||
|
|
||||||
```
|
|
||||||
Host dahlia
|
|
||||||
HostName dahlia.machine.deuxfleurs.fr
|
|
||||||
LocalForward 14646 127.0.0.1:4646
|
|
||||||
LocalForward 8501 127.0.0.1:8501
|
|
||||||
LocalForward 1389 bottin.service.prod.consul:389
|
|
||||||
LocalForward 5432 psql-proxy.service.prod.consul:5432
|
|
||||||
```
|
|
||||||
|
|
||||||
Then run the TLS proxy and leave it running:
|
|
||||||
|
|
||||||
```
|
|
||||||
./tlsproxy prod
|
|
||||||
```
|
|
||||||
|
|
||||||
SSH to a production machine (e.g. dahlia) and leave it running:
|
|
||||||
|
|
||||||
```
|
|
||||||
ssh dahlia
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
Finally you should see be able to access the production Nomad and Consul by browsing:
|
|
||||||
|
|
||||||
- Consul: http://localhost:8500
|
|
||||||
- Nomad: http://localhost:4646
|
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
# Additional README
|
# Overall architecture
|
||||||
|
|
||||||
## Configuring the OS
|
## Configuring the OS
|
||||||
|
|
||||||
|
@ -15,6 +15,7 @@ All deployment scripts can use the following parameters passed as environment va
|
||||||
- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning.
|
- `SUDO_PASS`: optionnally, the password for `sudo` on cluster nodes. If not set, it will be asked at the begninning.
|
||||||
- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.
|
- `SSH_USER`: optionnally, the user to try to login using SSH. If not set, the username from your local machine will be used.
|
||||||
|
|
||||||
|
|
||||||
### Assumptions (how to setup your environment)
|
### Assumptions (how to setup your environment)
|
||||||
|
|
||||||
- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`)
|
- you have an SSH access to all of your cluster nodes (listed in `cluster/<cluster_name>/ssh_config`)
|
||||||
|
@ -25,6 +26,7 @@ All deployment scripts can use the following parameters passed as environment va
|
||||||
- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs`
|
- you have a clone of the secrets repository in your `pass` password store, for instance at `~/.password-store/deuxfleurs`
|
||||||
(scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
|
(scripts in this repo will read and write all secrets in `pass` under `deuxfleurs/cluster/<cluster_name>/`)
|
||||||
|
|
||||||
|
|
||||||
### Deploying the NixOS configuration
|
### Deploying the NixOS configuration
|
||||||
|
|
||||||
The NixOS configuration makes use of a certain number of files:
|
The NixOS configuration makes use of a certain number of files:
|
||||||
|
@ -48,12 +50,9 @@ or to deploy only on a single node:
|
||||||
|
|
||||||
To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
|
To upgrade NixOS, use the `./upgrade_nixos` script instead (it has the same syntax).
|
||||||
|
|
||||||
**When adding a node to the cluster:** just do `./deploy_nixos <cluster_name> <name_of_new_node>`
|
|
||||||
|
|
||||||
### Generating and deploying a PKI for Consul and Nomad
|
### Generating and deploying a PKI for Consul and Nomad
|
||||||
|
|
||||||
This is very similar to how we do for Wesher.
|
|
||||||
|
|
||||||
First, if the PKI has not yet been created, create it with:
|
First, if the PKI has not yet been created, create it with:
|
||||||
|
|
||||||
```
|
```
|
||||||
|
@ -66,7 +65,8 @@ Then, deploy the PKI on all nodes with:
|
||||||
./deploy_pki <cluster_name>
|
./deploy_pki <cluster_name>
|
||||||
```
|
```
|
||||||
|
|
||||||
**When adding a node to the cluster:** just do `./deploy_pki <cluster_name> <name_of_new_node>`
|
Note that certificates are valid for not much more than one year: every year in January, `gen_pki` and `deploy_pki` have to be re-run to generate certificates for the new year.
|
||||||
|
|
||||||
|
|
||||||
### Adding administrators and password management
|
### Adding administrators and password management
|
||||||
|
|
||||||
|
@ -89,6 +89,7 @@ Then, an administrator that already has root access must run the following (afte
|
||||||
./deploy_passwords <cluster_name>
|
./deploy_passwords <cluster_name>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Deploying stuff on Nomad
|
## Deploying stuff on Nomad
|
||||||
|
|
||||||
### Connecting to Nomad
|
### Connecting to Nomad
|
||||||
|
@ -118,12 +119,12 @@ Stuff should be started in this order:
|
||||||
1. `app/core`
|
1. `app/core`
|
||||||
2. `app/frontend`
|
2. `app/frontend`
|
||||||
3. `app/telemetry`
|
3. `app/telemetry`
|
||||||
4. `app/garage-staging`
|
4. `app/garage`
|
||||||
5. `app/directory`
|
5. `app/directory`
|
||||||
|
|
||||||
Then, other stuff can be started in any order:
|
Then, other stuff can be started in any order, e.g.:
|
||||||
|
|
||||||
- `app/im` (cluster `staging` only)
|
- `app/im`
|
||||||
- `app/cryptpad` (cluster `prod` only)
|
- `app/cryptpad`
|
||||||
- `app/drone-ci`
|
- `app/drone-ci`
|
||||||
|
|
||||||
|
|
45
doc/onboarding.md
Normal file
45
doc/onboarding.md
Normal file
|
@ -0,0 +1,45 @@
|
||||||
|
# Onboarding / quick start for new administrators
|
||||||
|
|
||||||
|
## How to welcome a new administrator
|
||||||
|
|
||||||
|
See: https://guide.deuxfleurs.fr/operations/acces/pass/
|
||||||
|
|
||||||
|
Basically:
|
||||||
|
- The new administrator generates a GPG key and publishes it on Gitea
|
||||||
|
- All existing administrators pull their key and sign it
|
||||||
|
- An existing administrator reencrypt the keystore with this new key and push it
|
||||||
|
- The new administrator clone the repo and check that they can decrypt the secrets
|
||||||
|
- Finally, the new administrator must choose a password to operate over SSH with `./passwd prod rick` where `rick` is the target username
|
||||||
|
|
||||||
|
|
||||||
|
## How to operate a node (conncet to Nomad and Consul)
|
||||||
|
|
||||||
|
Edit your `~/.ssh/config` file with content such as the following:
|
||||||
|
|
||||||
|
```
|
||||||
|
Host dahlia
|
||||||
|
HostName dahlia.machine.deuxfleurs.fr
|
||||||
|
LocalForward 14646 127.0.0.1:4646
|
||||||
|
LocalForward 8501 127.0.0.1:8501
|
||||||
|
LocalForward 1389 bottin.service.prod.consul:389
|
||||||
|
LocalForward 5432 psql-proxy.service.prod.consul:5432
|
||||||
|
```
|
||||||
|
|
||||||
|
Then run the TLS proxy and leave it running:
|
||||||
|
|
||||||
|
```
|
||||||
|
./tlsproxy prod
|
||||||
|
```
|
||||||
|
|
||||||
|
SSH to a production machine (e.g. dahlia) and leave it running:
|
||||||
|
|
||||||
|
```
|
||||||
|
ssh dahlia
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Finally you should see be able to access the production Nomad and Consul by browsing:
|
||||||
|
|
||||||
|
- Consul: http://localhost:8500
|
||||||
|
- Nomad: http://localhost:4646
|
||||||
|
|
37
doc/why-not-ansible.md
Normal file
37
doc/why-not-ansible.md
Normal file
|
@ -0,0 +1,37 @@
|
||||||
|
# Why not Ansible?
|
||||||
|
|
||||||
|
I often get asked why not use Ansible to deploy to remote machines, as this
|
||||||
|
would look like a typical use case. There are many reasons, which basically
|
||||||
|
boil down to "I really don't like Ansible":
|
||||||
|
|
||||||
|
- Ansible tries to do declarative system configuration, but doesn't do it
|
||||||
|
correctly at all, like Nix does. Example: in NixOS, to undo something you've
|
||||||
|
done, just comment the corresponding lines and redeploy.
|
||||||
|
|
||||||
|
- Ansible is massive overkill for what we're trying to do here, we're just
|
||||||
|
copying a few small files and running some basic commands, leaving the rest
|
||||||
|
to NixOS.
|
||||||
|
|
||||||
|
- YAML is a pain to manipulate as soon as you have more than two or three
|
||||||
|
indentation levels. Also, why in hell would you want to write loops and
|
||||||
|
conditions in YAML when you could use a proper expression language?
|
||||||
|
|
||||||
|
- Ansible's vocabulary is not ours, and it imposes a rigid hierarchy of
|
||||||
|
directories and files which I don't want.
|
||||||
|
|
||||||
|
- Ansible is probably not flexible enough to do what we want, at least not
|
||||||
|
without getting a migraine when trying. For example, it's inventory
|
||||||
|
management is too simple to account for the heterogeneity of our cluster
|
||||||
|
nodes while still retaining a level of organization (some configuration
|
||||||
|
options are defined cluster-wide, some are defined for each site - physical
|
||||||
|
location - we deploy on, and some are specific to each node).
|
||||||
|
|
||||||
|
- I never remember Ansible's command line flags.
|
||||||
|
|
||||||
|
- My distribution's package for Ansible takes almost 400MB once installed,
|
||||||
|
WTF??? By not depending on it, we're reducing the set of tools we need to
|
||||||
|
deploy to a bare minimum: Git, OpenSSH, OpenSSL, socat,
|
||||||
|
[pass](https://www.passwordstore.org/) (and the Consul and Nomad binaries
|
||||||
|
which are, I'll admit, not small).
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue