quentin.dufour.io/_posts/2021-09-14-synapse-media-storage-provider.md

199 lines
9.7 KiB
Markdown

---
layout: post
slug: matrix-synapse-s3-storage
status: published
sitemap: true
title: Storing Matrix media on a S3 backend
description: Matrix has multiple solutions to store is media on S3, we review them and point their drawbacks
category: operation
tags:
---
By default, Matrix Synapse stores its media on the local filesystem which rises many issues.
It exposes your users to loss of data, availability issues but mainly scalability/sizing issues.
Especially as we live in an era where users expect no resource limitation, where software are not
designed to garbage collect or even track resource usage, it is really hard to plan ahead resources you will use.
In practise, it leads to 2 observations: resource overprovisioning and distributed filesystems.
The first one often leads to wasted resources while the second one is often hard to manage and require expensive hardware and network.
Thankfully, as we store blob data, we do not need the full power of a filesystem and a more lightweight API like S3 is enough.
In Matrix Synapse language, these solutions are referred as storage provider.
In this article, we will see how we migrated from GlusterFS to Matrix's S3 storage provider + our [Garage](garagehq.deuxfleurs.fr/) backend.
## Internals
First, Matrix's developpers make a difference between a *media provider* and a *storage provider*.
It appears that files are always stored in the *media provider* even if a *storage provider* is registered, and there is no way
to change this behavior in the code. And unfortunately the *media provider* can only use the filesystem.
For example when fetching a media, we can see [in the code](
https://github.com/matrix-org/synapse/blob/b996782df51eaa5dd30635a7c59c93994d3a735e/synapse/rest/media/v1/media_storage.py#L185-L198) that the filesystem is always probed first, and only then our remote backend.
We also see [in the code](
https://github.com/matrix-org/synapse/blob/b996782df51eaa5dd30635a7c59c93994d3a735e/synapse/rest/media/v1/media_storage.py#L202-L211) that the *media provider* can be referred as the local cache and that some parts of the code may require that a file is in the local cache.
As a conclusion, the best we can do is to keep the *media provider* as a local cache.
The concept of cache is very artificial as there is no integrated tool for cache eviction: it is our responsability to garbage collect the cache.
## Migration
We can easily configure the S3 synapse provider in our `homeserver.yaml`:
```yaml
media_storage_providers:
- module: s3_storage_provider.S3StorageProviderBackend
store_local: True
store_remote: True
store_synchronous: True
config:
bucket: matrix
region_name: garage
endpoint_url: XXXXXXXXXXXXXX
access_key_id: XXXXXXXXXXXXXX
secret_access_key: XXXXXXXXXXX
```
Registering the module like that will only be useful for our new media, `store_local: True` and `store_remote: True` means that newly media will be uploaded to our S3 target and we want to check that upload suceed before notifying the user (`store_synchronous: True`). The rationale for there store options is to enable administators to handle the upload with a *pull approach* rather than with our *push approach*. In practise, for the *pull approach*, administrators have to call regularly a script (with a cron for example) to copy the files on the target. A script is provided by the extension developpers named `s3_media_upload`.
This script is also the sole way to migrate old media (that cannot be *pushed*) so we will still have to use it.
First, we need some setup to use this tool:
- postgres credentials + endpoint must be stored in a `database.yml` file
- s3 credentials must be configured as per the [boto convention](https://boto3.amazonaws.com/v1/documentation/api/1.9.46/guide/configuration.html) and the endpoint can be specified on the command line
- the path to the local cache/media repository is also passed through the command line
This script needs to store some states between command executions and thus will create a sqlite in your working directory named `cache.db`. Do not delete it!
In practise, your database configuration may be created as follow:
```bash
cat > database.yaml <<EOF
user: xxxxx
password: xxxxx
database: xxxxxx
host: xxxxxxxx
port: 5432
EOF
```
And S3 can be configured through environment variables:
```bash
export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""
export AWS_DEFAULT_REGION="garage"
```
We are now ready, the other parameters will be passed on the command line.
## Use the tool
First we must build a list of media that we want to send to S3.
I guess that developpers designed this tool with the idea that S3 is an archive target and that we want to keep recent data locally.
That's why a duration is required, because they want to send only old data to S3.
Here, we will fetch media that are at least one day (`1d`) old, but you can set 1 month (`1m`) to keep more media locally or 0 day (`0d`) if you want close to no local cache. For more details, check [the source code](https://github.com/matrix-org/synapse-s3-storage-provider/blob/main/scripts/s3_media_upload#L140-L185).
```bash
./s3_media_upload update-db 1d
```
Filters media that are not on the local filesystem, either because they were already uploaded to our S3 backend or because they are lost. [See the code](https://github.com/matrix-org/synapse-s3-storage-provider/blob/main/scripts/s3_media_upload#L188-L217).
*Please not that I deactivated the progress bar because it is buggy on my docker exec inside a screen inside a ssh session.*
```bash
./s3_media_upload --no-progress check-deleted /var/lib/matrix-synapse/media
```
If we want to combine `update-db` and `check-deleted`, we can run `update`.
Now, before doing any action, we might want to see our candidates.
These candidates may already be present on our S3 target so you may end up uploading less data.
```bash
./s3_media_upload write
```
The command upload does many things at once:
- check again that the file is still on the local filesystem
- check if the file exists on S3
- upload it to S3 if needed
- optionnaly delete the local file
Ideally, I would only use our S3 target and not anymore the local filesystem.
Because it is not possible with this module, at least I delete uploaded content from the local filesystem.
[See the source code for more details](https://github.com/matrix-org/synapse-s3-storage-provider/blob/main/scripts/s3_media_upload#L220-L282).
My final command looks like this:
```
./s3_media_upload --no-progress upload /var/lib/matrix-synapse/media matrix --delete --endpoint-url https://garage.deuxfleurs.fr
```
## GlusterFS again
By running this script one month after activating the main module, I observed that many files were missing on our S3 target, around 60%.
Our setup was as follow:
- The media repository (the local filesystem) was on GlusterFS
- The storage provider (our S3 target) was handled by Garage
We now that our GlusterFS target suffers from severe performance issues.
I manually migrated the files then deployed a second setup:
- The media repository was now mounted in RAM (a tmpfs)
- The storage provider was still our S3 target
And now, all of our media are successfully sent on our S3 target.
My guess is that each media is first written on the local filesystem and then sent on S3.
Because GlusterFS is slow and error prone, some exceptions or timeouts may be risen before the file is uploaded to S3.
At least, we now consider the problem as solved.
We only need one more step: regulargy cleaning up the local filesystem to not fill our RAM.
## Goold old cron
Because there is no elegant solution and my time is limited, I chose to write a script that run every 10 minutes.
It checks that the files are already on the S3 bucket and then delete them from the filesystem.
```bash
#!/bin/bash
cat > database.yaml <<EOF
user: $PG_USER
password: $PG_PASS
database: $PG_DB
host: $PG_HOST
port: $PG_PORT
EOF
while true; do
s3_media_upload update-db 0d
s3_media_upload --no-progress check-deleted $MEDIA_PATH
s3_media_upload --no-progress upload $MEDIA_PATH $BUCKET --delete --endpoint-url $ENDPOINT
sleep 600
done
```
To use it, you must set the following environment variables:
- For AWS: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_DEFAULT_REGION`, `ENDPOINT`, `BUCKET`
- For Postgres: `PG_USER`, `PG_PASS`, `PG_DB`, `PG_HOST`, `PG_PORT`
- For the filesystem: `MEDIA_PATH`, we suppose `s3_media_upload` is in your `PATH`.
## matrix-media-repo
I presented the "native" way to handle media on Matrix Synapse but there is also a community managed project named [`matrix-media-repo`](https://docs.t2bot.io/matrix-media-repo) with a slightly different goal. The author wanted to have a common media repository for multiple servers to reduce storage costs.
matrix-media-repo is not implementation independent: instead, it shadows the matrix endpoint used for the media `/_matrix/media` and thus is compatible with any matrix server, like dendrite or conduit. Its main advantage over our solution is that it does not have this mandatory cache, it can directly upload and serve from a S3 backend, simplifying the management.
Depending on your reverse proxy, it might be possible that if `matrix-media-repo` is down, users are redirected to the original endpoint that should not be used anymore, leading to loss of data and strange behaviors. It seems that [an option](https://github.com/matrix-org/synapse/blob/v1.42.0/synapse/config/server.py#L265-L269) in Synapse allows to deactivate the media-repo, it might save you some time if it works.
## Conclusion
Using a S3 target with Matrix is not trivial. `matrix-media-repo` seems to be a better solution but in practise it has also its own drawbacks. For now, even if not optimal, our deployed solutions works well and it's what matters.