From 54567241cd3a22b1bc331f4f356759477fb8fcb0 Mon Sep 17 00:00:00 2001 From: Quentin Dufour Date: Tue, 21 Sep 2021 12:29:39 +0200 Subject: [PATCH] add an article about matrix synapse and s3 --- ...07-12-chroniques-administration-synapse.md | 2 +- ...21-09-14-synapse-media-storage-provider.md | 198 ++++++++++++++++++ 2 files changed, 199 insertions(+), 1 deletion(-) create mode 100644 _posts/2021-09-14-synapse-media-storage-provider.md diff --git a/_posts/2021-07-12-chroniques-administration-synapse.md b/_posts/2021-07-12-chroniques-administration-synapse.md index c677c29..ba1fdd7 100644 --- a/_posts/2021-07-12-chroniques-administration-synapse.md +++ b/_posts/2021-07-12-chroniques-administration-synapse.md @@ -2,7 +2,7 @@ layout: post slug: chroniques-administration-synapse status: published -sitemap: false +sitemap: true title: Chroniques d'administration de Synapse description: Pour l'instant tout va bien, pour l'instant tout... category: operation diff --git a/_posts/2021-09-14-synapse-media-storage-provider.md b/_posts/2021-09-14-synapse-media-storage-provider.md new file mode 100644 index 0000000..ce8bb24 --- /dev/null +++ b/_posts/2021-09-14-synapse-media-storage-provider.md @@ -0,0 +1,198 @@ +--- +layout: post +slug: matrix-synapse-s3-storage +status: published +sitemap: true +title: Storing Matrix media on a S3 backend +description: Matrix has multiple solutions to store is media on S3, we review them and point their drawbacks +category: operation +tags: +--- + +By default, Matrix Synapse stores its media on the local filesystem which rises many issues. +It exposes your users to loss of data, availability issues but mainly scalability/sizing issues. +Especially as we live in an era where users expect no resource limitation, where software are not +designed to garbage collect or even track resource usage, it is really hard to plan ahead resources you will use. + +In practise, it leads to 2 observations: resource overprovisioning and distributed filesystems. +The first one often leads to wasted resources while the second one is often hard to manage and require expensive hardware and network. + +Thankfully, as we store blob data, we do not need the full power of a filesystem and a more lightweight API like S3 is enough. +In Matrix Synapse language, these solutions are referred as storage provider. +In this article, we will see how we migrated from GlusterFS to Matrix's S3 storage provider + our [Garage](garagehq.deuxfleurs.fr/) backend. + +## Internals + +First, Matrix's developpers make a difference between a *media provider* and a *storage provider*. +It appears that files are always stored in the *media provider* even if a *storage provider* is registered, and there is no way +to change this behavior in the code. And unfortunately the *media provider* can only use the filesystem. + +For example when fetching a media, we can see [in the code]( +https://github.com/matrix-org/synapse/blob/b996782df51eaa5dd30635a7c59c93994d3a735e/synapse/rest/media/v1/media_storage.py#L185-L198) that the filesystem is always probed first, and only then our remote backend. + +We also see [in the code]( +https://github.com/matrix-org/synapse/blob/b996782df51eaa5dd30635a7c59c93994d3a735e/synapse/rest/media/v1/media_storage.py#L202-L211) that the *media provider* can be referred as the local cache and that some parts of the code may require that a file is in the local cache. + +As a conclusion, the best we can do is to keep the *media provider* as a local cache. +But even if this case, it is our responsability to garbage collect the cache. + +## Migration + +We can easily configure the S3 synapse provider in our `homeserver.yaml`: + +```yaml +media_storage_providers: +- module: s3_storage_provider.S3StorageProviderBackend + store_local: True + store_remote: True + store_synchronous: True + config: + bucket: matrix + region_name: garage + endpoint_url: XXXXXXXXXXXXXX + access_key_id: XXXXXXXXXXXXXX + secret_access_key: XXXXXXXXXXX +``` + +But registering it like that will only be useful for our new media (because we activated `store_local` and `store_remote` for local and remote content that must automatically pushed to our S3 backend). + +Old media must be migrated with a script named `s3_media_upload`. First, we need some setup to use this tool: + - postgres credentials + endpoint must be stored in a `database.yml` file + - s3 credentials must be configured as per the [boto convention](https://boto3.amazonaws.com/v1/documentation/api/1.9.46/guide/configuration.html) and the endpoint can be specified on the command line + - the path to the local cache/media repository is also passed through the command line + +This script needs to store some states between command executions and thus will create a sqlite in your working directory named `cache.db`. Do not delete it! + +In practise, your database configuration may be created as follow: + +```bash +cat > database.yaml < database.yaml <