Done with the article content

This commit is contained in:
Quentin 2024-02-18 11:19:58 +01:00
parent 5adc7baf24
commit 718830a0b0
Signed by: quentin
GPG key ID: E9602264D639FF68

View file

@ -1,10 +1,11 @@
+++ +++
title="Does Aerogramme use lot of RAM?" title="Does Aerogramme use too much RAM?"
date=2024-02-15 date=2024-02-15
+++ +++
*"Will Aerogramme use lot of RAM" was the first question we asked ourselves *"Will Aerogramme use too much RAM?" was the first question we asked ourselves
when designing email mailboxes as an encrypted event log. This blog post when designing email mailboxes as an encrypted event log, which is very different from
existing designs that are very optimized. This blog post
tries to evaluate our design assumptions to the real world implementation, tries to evaluate our design assumptions to the real world implementation,
similarly to what we have done [on Garage](https://garagehq.deuxfleurs.fr/blog/2022-perf/).* similarly to what we have done [on Garage](https://garagehq.deuxfleurs.fr/blog/2022-perf/).*
@ -265,6 +266,12 @@ and 2) mutualizing the notification/wake up mechanism.
## Query Commands ## Query Commands
Query commands are the most used commands in IMAP,
they are very expressive and allows the client to fetch only what they need:
a list of the emails without their content, displaying an email body without
having to fetch its attachment, etc.
Of course, this expressivity creates complexity!
### Fetching emails ### Fetching emails
Often, IMAP clients in first instance, are only interested by email metadata. Often, IMAP clients in first instance, are only interested by email metadata.
@ -317,7 +324,7 @@ UID SEARCH BEFORE 2024-02-09
First, we start with a SEARCH command inspired by what we have seen in the logs on the whole mailbox, and that can be run First, we start with a SEARCH command inspired by what we have seen in the logs on the whole mailbox, and that can be run
without fetching the full email from the blob storage. without fetching the full email from the blob storage.
![Search meta](./search-meta.png) ![Search meta](search-meta.png)
Spike order: 1) artifact, ignored, 2) login+select, 3) search, 4) logout Spike order: 1) artifact, ignored, 2) login+select, 3) search, 4) logout
We load ~10MB in memory to make our request that is quite fast. We load ~10MB in memory to make our request that is quite fast.
@ -325,7 +332,7 @@ We load ~10MB in memory to make our request that is quite fast.
But we also know that some SEARCH requests will require to fetch some content But we also know that some SEARCH requests will require to fetch some content
from the S3 object storage, and in this case, the profile is different. from the S3 object storage, and in this case, the profile is different.
![Search body](./search-body.png) ![Search body](search-body.png)
We have the same profile as FETCH FULL: a huge allocation of memory and a very CPU intensive task. We have the same profile as FETCH FULL: a huge allocation of memory and a very CPU intensive task.
The conclusion is similar to FETCH: while these commands are OK to be slow, it's not OK to allocate so much memory. The conclusion is similar to FETCH: while these commands are OK to be slow, it's not OK to allocate so much memory.
@ -345,8 +352,66 @@ for the LIST command in itself.
## Discussion ## Discussion
*TODO* At this level of maturity, the main goal for Aerogramme is predictable & stable resource usage server side.
Indeed, there is nothing more annoying than a user, honest or malicious, breaking the server while running
a resource intensive command.
**Querying bodies** - They may allocate the full mailbox in RAM. For some parameters (eg. FETCH BODY),
it might be possible to precompute data, but for some others (eg. SEARCH TEXT "something") it's not possible, so precomputing
is not a definitive solution. Also being slow is acceptable here: we just want to avoid being resource intensive. The solution I envision is to "stream"
the fetched emails and process them one by one. For some commands like FETCH 1:* BODY[], that are to the best of my knowledge,
never run by real clients, it will not be enough however. Indeed, Aerogramme will drop the fetched emails, but it will have an in-memory copy inside
the response object awaiting for delivery. So we might want to implement response streaming too.
**argon2** - Login resource usage is due to argon2, but compared to many other protocols, authentications in IMAP occure way more often.
For example, if a user configure their email client to poll their mailbox every 5 minutes, this client will authenticate 288 times in a single day!
argon2 resource usage is a tradeoff with the bruteforcing difficulty, reducing its resource usage is thus a possible performance mitigation
solution at the cost of reduced security. Other options might reside in the evolution of our authentication system: even if I don't
know what is the current state of implementation of OAUTH in existing IMAP clients, it could be considered as an option.
Indeed, the user enters their password once, during the configuration of their account, and then a token is generated and stored in the client.
The credentials could be stored in the token (think a JSON Web Token for example), avoiding the expensive KDF on each connection.
**IDLE** - IDLE RAM usage is the same as other commands, but IDLE keeps the RAM allocated for way longer.
To illustrate my point, let's suppose an IMAP session uses 5MB of RAM,
and two populations of 1 000 users. We suppose users monitor only one mailbox (which is not true in many cases).
The first population, aka *the poll population*, configures their client to poll the server every 5 minutes, polling takes 2 seconds (LOGIN + SELECT + LOGOUT).
The second population, aka *the push population*, configures their client to "receive push messages" - ie. using IMAP IDLE.
For the *push population*, the RAM usage will
be a stable 5GB (5MB * 1 000 users): all users will be always connected.
With a 1MB session, we could reduce the RAM usage to 1GB: any improvement on the session base RAM
will be critical to IDLE with the current design.
For the *poll population*, we can split the time in 150 ticks (5 minutes / 2 seconds = 150 ticks).
It seems the problem can be mapped to a [Balls into bins problem](https://en.wikipedia.org/wiki/Balls_into_bins_problem)
with random allocation (yes, I know, assuming random allocation might not hold in many situations).
Based on the Wikipedia formula, if I am not wrong, we can suppose that, with high probability, at most 13 clients will be connected
at once, which means 65MB of RAM usage (5MB * 13 clients/tick = 65MB/tick).
With these *back-of-the-envelope calculations*, we understand how crucial the IDLE RAM consumption is compared
to other commands, and how the base RAM consumption of a user will impact the service.
**Large email streaming** - Finally, email streaming could really improve RAM usage for large emails, especially on APPEND and LMTP delivery,
or even on FETCH when the body is required. But implementing such a feature would require an email parser that
can work on streams, which in turns [is not something trivial](https://github.com/rust-bakery/nom/issues/1160).
While it seems untimely to act now, these spots are great candidates for a closer monitoring for future performance evaluations.
Fixing these points, above the simple mitigations, will involve important design changes in Aerogramme, which means
in the end: writing lot of code! That's why I think Aerogramme can work with these "limitations" for now,
and we will take decisions about these points when it will be *really* required.
## Conclusion ## Conclusion
*TODO* Back to the question "Does Aerogramme use too much RAM?",
it of course depends on the context.
For now, we want to start with 1 000 users, a 1k email INBOX, and a server of 8GB of RAM,
and Aerogramme seems ready for a first deployment in this context. Of course, in the long term,
we expect better ressource usages.
Based on this benchmark, I identify 3 low-hanging fruits to improve performances that do not require major design changes
: 1) in FETCH+SEARCH queries, handling emails one after another instead of loading the full mailbox in memory
, 2) streaming FETCH responses instead of aggregating them in memory
, and 3) reducing the RAM usage of a base user by tweaking its Garage connectors configuration
Collecting production data will then help priorize other, more ambitious works, on the authentication side,
on the idling side, and on the email streaming side.