aerogramme.deuxfleurs.fr/content/blog/2024-predictability-and-correctness/index.md
2024-02-22 22:29:10 +01:00

5.1 KiB

+++ title="Aerogramme 0.2.2: predictability & user testing" date=2024-02-22 +++

Let's review how Aerogramme performances became more predictable, why it's important, and showcase how user testing helped surface bugs.


This minor version of Aerogramme put the focus on 2 aspects of the software: predictable performances & collecting user feedbacks. In the following, I describe both aspect in details.

More predictable performances

In the previous blog post, we asked ourselves Does Aerogramme use too much memory?. From the discussion, we surfaced it was not acceptable that some specific queries (FETCH FULL or SEARCH TEXT) where loading the full mailbox in memory. It's concerning as the mail server will be used by multiple users and as a limited amount of resources, so we don't want a user allocating all the memory. In other words, we want to have a per-user resource usage that remain as stable as possible. These ideas are developed more in depth in the Amazon article Reliability & Constant Work.

As part of the conclusion, we identified that streaming emails content would solve our problem. In practice, I rewrote the relevant part of the code to return a futures::stream::Stream instead of a Vec. Then, for both requests, I re-run the same benchmark to have a before/after comparison for both commands. Next, the first pictures is for FETCH FULL, the second one for SEARCH TEXT.

Fetch resource usage for Aerograme 0.2.1 & 0.2.2

Search resource usage for Aerograme 0.2.1 & 0.2.2

For both FETCH and SEARCH, the changes are identical. Before, the command was executed in ~5 seconds, allocated up to 300MB of RAM, and used up to 150% of CPU. After, the command took ~30 to get executed, allocated up to 400MB of RAM, used up to 40MB of RAM, and used up to 80% of CPU sporadically. Again, that's a positive thing, because now the memory consumption of Aerogramme is capped approximately by the biggest email accepted by your system multiplied by a small constant. It has also other benefits: it prevents the file descriptor and other network ressource exhaustion, and it adds fairness between users. Indeed, a user can't monopolize all the ressources of the servers (CPU, I/O, etc.) anymore, and thus multiple users requests are thus intertwined by the server. And again, it leads to better predictability, as per-user requests completion will be less impacted by other requests.

TODO AWS SDK

Users feedbacks

Dovecot AUTH continuation inlining - When a username + password is short, the Dovecot SASL Auth protocol allows the client (here Postfix) to send the base64 inlined, without having to wait for the continuation. It was not supported by Aerogramme and was preventing some users from authenticating.

Pipelining limits (reported by Nicolas) - Pipeling limit set to 3. Avoiding DoS resources. But failing some honest clients like Mutt. Bumped to 64, will be watched in the next months.

SASL Auth subtleties (reported by Nicolas) - Authorization can be empty, or can be set to the same value as Authentication. Second case not handled but required by Fair Email (thx Nicolas)

Thunderbird Autodiscovery issues (reported by LX & Nicolas) - K9 stable does not support %EMAILLOCALPART%. K9 beta (6.714) does not support some values marked as obsolete in the authentication field: plain is not supported anymore, password-cleartext must be used instead. Content-Type is important also, if a wrong one is sent, content is silently ignored by some clients.

Broken LITERAL+ (reported by Maxime) - It was not possible to copy more than one email at once to an Aerogramme mailbox. It was due to the fact we were using an old version of imap-flow that was not correctly supporting LITERAL+. Upgrading imap-flow to the latest version fixed the problem.

Broken IDLE (reported by Maxime) - After updating imap-flow, we started noticing some timeouts in Thunderbird due to IDLE bugs. When IDLE was implemented in Aerogramme, the code was not ready in imap-flow, and thus I used some hacks. But upgrading the library broke my hacks for the best: now imap-flow supports IDLE out of the box, and thus Aerogramme code is now cleaner and more maintainable.

Some others quality of life feedbacks not reported here have been made by MrFlos & Nicolas, thanks to all the people that took part in this debugging adventure.

Conclusion: download and test the new version

Do not get me wrong: Aerogramme is still not ready for prime time. But by operating it for real, we start understanding better how it behaves, what are the rough edges, etc. If you are interested in Aerogramme and put your personal touch to its development, it might be the good time to setup a new cluster, try it to be ready for its public beta which will correspond to the moment where Deuxfleurs will deploy it for their users.

In the mean time:

docker run -p 1143:1143 registry.deuxfleurs.org/aerogramme:0.2.2

Download - Changelog