Commit graph

802 commits

Author SHA1 Message Date
Baptiste Jonglez
66b2a88826 prod: Allow bespin for all services 2025-04-16 00:29:42 +02:00
Baptiste Jonglez
2d86cd239f email: move service to corrin (fiber outage in scorpio) 2025-04-15 22:13:56 +02:00
Baptiste Jonglez
b4d04e12fe prod: move cryptpad to bespin (fiber outage in scorpio) 2025-04-15 21:17:25 +02:00
Armaël Guéneau
1c9d01db87 staging: deploy tricot with jemalloc heap profiling 2025-04-13 11:03:20 +02:00
Armaël Guéneau
5fd5f72c81 garage job: tag tricot-site-lb has been renamed into tricot-local-lb 2025-04-12 21:05:21 +02:00
Armaël Guéneau
2dbfc3f796 grafana dashboard "tricot global": add in-flight requests 2025-04-12 18:49:54 +02:00
Armaël Guéneau
66278fd7fb upgrade tricot (now with log rate limiting and in-flight requests meter) 2025-04-12 18:29:13 +02:00
Armaël Guéneau
477840cf66 deploy tricot on bespin 2025-04-12 18:29:03 +02:00
Armaël Guéneau
f010eb1296 staging: upgrade tricot, now with log rate limiting
rate limit is set at a generous 30 log/s, since not much is happening
on staging
2025-04-12 18:10:02 +02:00
Armaël Guéneau
72647fa83a staging: upgrade tricot (new feature: meter for in-flight requests) 2025-04-12 18:10:02 +02:00
03ecd58c55
add a comment on how to put a node in maintenance mode 2025-04-12 15:02:19 +02:00
Baptiste Jonglez
fc7bf04a9c jitsi: health check videobridge 2025-04-12 13:31:25 +02:00
Baptiste Jonglez
1d817399eb coturn: Fix wrong DNS config for IPv6 (the CNAME pointed to the machine hosting Tricot, which is incorrect for TURN) 2025-04-12 12:34:26 +02:00
eb373986fb
drop deuxfleurs redirect hack 2025-04-11 12:20:22 +02:00
be1d80c99d
deploy update to prod 2025-04-11 11:15:23 +02:00
06b92742ab
update matrix, fix cve 2025-04-11 11:11:25 +02:00
fa5b564771
rollback bespin 2025-04-11 10:46:39 +02:00
f7cbf12400
update tricot & garage deployment 2025-04-11 10:34:56 +02:00
455b568940
Remove dummy gitea d53 job, as the IP are now fully static 2025-04-08 21:53:37 +02:00
4d07858941
ajout du domaine de l'utilisateur "n" 2025-04-07 07:36:28 +02:00
8cd50a2a57
prevent plm search init from hanging 2025-04-04 16:00:23 +02:00
8d1ff96cbf
set a 1 day default cache (instead of 120sec) 2025-04-04 15:26:27 +02:00
27293b19a2
upgrade plume 2025-04-04 15:09:19 +02:00
ef1735f781
update backups too! 2025-03-31 22:01:50 +02:00
6a4a2528f1
migration de ananas vers abricot des emails 2025-03-31 21:48:13 +02:00
8c2c0cf949 email: dkim signing table 2025-03-31 21:11:58 +02:00
Baptiste Jonglez
fe68fdf54a plume: increase memory again 2025-03-26 20:21:57 +01:00
Baptiste Jonglez
187d36eb9b deploy_nixos: add help to apply changes without rebooting in production 2025-03-26 00:17:59 +01:00
Baptiste Jonglez
fd6275f5bc prod: Fix vim configuration syntax (different between staging and prod due to NixOS version difference) 2025-03-26 00:17:08 +01:00
Baptiste Jonglez
fc88a063b1 node_exporter: avoid using network mode host 2025-03-25 22:21:35 +01:00
Baptiste Jonglez
bb8c9db2ed telemetry: avoid network mode host, and poll less often 2025-03-25 22:12:42 +01:00
451068d716 Merge pull request 'prod: telemetry: Add smartctl_exporter based on staging work' (#53) from prod_smartctl_monitoring into main
Reviewed-on: #53
2025-03-25 21:09:08 +00:00
Baptiste Jonglez
797f946578 prod: telemetry: Add smartctl_exporter based on staging work 2025-03-24 17:53:17 +01:00
Baptiste Jonglez
596b7ab966 prod: telemetry: rename node-exporter job 2025-03-24 17:51:55 +01:00
Baptiste Jonglez
ec1fa3e540 staging: telemetry: Use a init task to create fake disk devices for smartctl_exporter 2025-03-24 17:47:05 +01:00
67230dd60c
guichet now advertise the correct dxfl login command 2025-03-24 16:48:18 +01:00
305c160899
guichet upgrade 2025-03-21 00:27:05 +01:00
Baptiste Jonglez
8d9aa00de5 staging: harden config of smartctl exporter
It currently requires all nodes to have /dev/sda (the device passthrough is hardcoded for now)
2025-03-19 23:46:55 +01:00
Baptiste Jonglez
5790453ff1 nix: Allow all capabilities in Nomad
This will be necessary for the smartctl exporter since it needs Linux
capabilities that are not allowed by default in Nomad.

We only have trusted Nomad jobs, and we already allow privileged
containers anyway, so there is no security impact.
2025-03-19 23:39:04 +01:00
Baptiste Jonglez
a2a470ac3d staging: promote piranha to Nomad server (caribou is dead) 2025-03-19 23:08:49 +01:00
Baptiste Jonglez
2009572fea prod: telemetry: move storage from bespin/scorpio to bespin/corrin 2025-03-12 21:22:56 +01:00
Baptiste Jonglez
8f0a45f03e staging: telemetry: add smartctl exporter 2025-03-12 21:06:56 +01:00
Baptiste Jonglez
b98e72af96 staging: telemetry: Fix metric collection due to faulty Consul connection 2025-03-12 20:51:49 +01:00
Baptiste Jonglez
e805cf5cf6 Augmentation stockage prometheus
La limite actuelle correspond à environ 2 mois d'historique prometheus,
c'est parfois trop peu pour pouvoir relever des tendances sur le long terme.
2025-03-11 23:10:07 +01:00
6b52ccd374 Merge pull request 'upgrade garage to v1.99.1' (#49) from garage-1.99 into main
Reviewed-on: #49
2025-03-09 09:48:50 +00:00
Armaël Guéneau
c5a0577cbf upgrade garage to v1.99.1 2025-03-09 10:44:12 +01:00
Armaël Guéneau
40da5ccca2 nixos config: tweak 2025-03-07 11:43:49 +01:00
Armaël Guéneau
0051891ff0 staging: upgrade garage to v1.99-internal (support for redirections) 2025-03-07 11:43:06 +01:00
Armaël Guéneau
41961df583 woodpecker: change site neptune->corrin 2025-03-01 22:34:55 +01:00
Armaël Guéneau
e61c7449c1 matrix: allow running on site 'corrin' and remove 'neptune' (not a prod site anymore) 2025-03-01 22:32:10 +01:00