WIP resource usage Aerogramme
This commit is contained in:
parent
378aebd3a5
commit
03d58045ab
4 changed files with 134 additions and 61 deletions
BIN
content/blog/2024-ram-usage-encryption-s3/copy-move.png
Normal file
BIN
content/blog/2024-ram-usage-encryption-s3/copy-move.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 32 KiB |
|
@ -51,6 +51,8 @@ Below I plotted the empirical distribution for both my dataset and my personal i
|
|||
|
||||
![ECDF mailbox](ecdf_mbx.svg)
|
||||
|
||||
*[Get the 100 emails dataset](https://git.deuxfleurs.fr/Deuxfleurs/aerogramme/src/commit/0b20d726bbc75e0dfd2ba1900ca5ea697645a8f1/tests/emails/aero100.mbox.zstd) - [Get the CSV used to plot this graph](https://git.deuxfleurs.fr/Deuxfleurs/aerogramme/src/branch/perf/cpu-ram-bottleneck/tests/emails/mailbox_email_sizes.csv)*
|
||||
|
||||
We see that the curves are close together and follow the same pattern: most emails are between 1kB and 100kB, and then we have a long tail (up to 20MB in my inbox, up to 6MB in the dataset).
|
||||
It's not that surprising: on many places on the Internet, the limit on emails is set to 25MB. Overall I am quite satisfied by this simple dataset, even if having one or two bigger emails could make it even more representative of my real inbox...
|
||||
|
||||
|
@ -65,6 +67,8 @@ The following bar plot depicts the command distribution per command name; top is
|
|||
|
||||
![Commands](command-run.svg)
|
||||
|
||||
*[Get the IMAP command log](https://git.deuxfleurs.fr/Deuxfleurs/aerogramme/src/branch/perf/cpu-ram-bottleneck/tests/emails/imap_commands_dataset.log) - [Get the CSV used to plot this graph](https://git.deuxfleurs.fr/Deuxfleurs/aerogramme/src/branch/perf/cpu-ram-bottleneck/tests/emails/imap_commands_summary.csv)*
|
||||
|
||||
First, we can handle separately some commands: LOGIN, CAPABILITY, ENABLE, SELECT, EXAMINE, CLOSE, UNSELECT, LOGOUT as they are part of a **connection workflow**.
|
||||
We do not plan on studying them directly as they will be used in all other tests.
|
||||
|
||||
|
@ -104,6 +108,99 @@ UID SEARCH BEFORE 2024-02-09
|
|||
```
|
||||
-->
|
||||
|
||||
|
||||
|
||||
In the following, I will keep these 3 categories: **writing**, **notification**, and **query** to evaluate Aerogramme's ressource usage
|
||||
based on command patterns observed on real IMAP commands and the provided dataset.
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Write Commands
|
||||
|
||||
We start by the write commands as it will enable us to fill the mailboxes for the following evaluations.
|
||||
|
||||
I inserted the full dataset (100 emails) to 16 accounts (in other words, in the end, the server handles 1 600 emails) with APPEND.
|
||||
*[Get the Python script](https://git.deuxfleurs.fr/Deuxfleurs/aerogramme/src/branch/main/tests/instrumentation/mbox-to-imap.py)*
|
||||
|
||||
### Filling a mailbox
|
||||
|
||||
![Append Custom Build](01-append-tokio-console-musl.png)
|
||||
|
||||
|
||||
First, I observed this *scary* linear memory increase. It seems we are not releasing some memory,
|
||||
and that's an issue! I quickly suspected tokio-console of being the culprit.
|
||||
A quick search lead me to an issue entitled [Continuous memory leak with console_subscriber #184](https://github.com/tokio-rs/console/issues/184)
|
||||
that confirmed my intuition.
|
||||
Instead of waiting for an hour or trying to tweak the retention time, I built Aerogramme without tokio console.
|
||||
|
||||
*So in a first approach, we observed the impact of tokio console instead of our code! Still, we want to
|
||||
have performances as predictable as possible.*
|
||||
|
||||
![Append Cargo Release](02-append-glibc.png)
|
||||
|
||||
Which got us to this second pattern: a stable but high memory usage compared to previous run.
|
||||
It appears I built the binary with `cargo release`, which creates a binary that dynamically link to the GNU libc.
|
||||
The previous binary was built with our custom Nix toolchain that statically link musl libc to our binary.
|
||||
In the process, we changed the allocator: it seems the GNU libc allocator allocates bigger chunks at once.
|
||||
|
||||
*It would be wrong to conclude the musl libc allocator is more efficient: allocating and unallocating
|
||||
memory on the kernel side is costly, and thus it might be better for the allocator to keep some kernel allocated memory
|
||||
for future memory allocations that will not require system calls. This is another example of why this benchmark is wrong: we observe
|
||||
the memory allocated by the allocator, not the memory used by program itself.*
|
||||
|
||||
For the next graph, I removed tokio-console and built Aerogramme with a static musl libc.
|
||||
|
||||
![Append Custom Build](03-append-musl.png)
|
||||
|
||||
The observed patterns match way better what I was expecting.
|
||||
We observe 16 spikes of memory allocation, around 50MB, followed by a 25MB memory usage.
|
||||
In the end, we drop to ~18MB.
|
||||
|
||||
In this scenario, we can say that a user needs between 32MB of RAM and 7MB.
|
||||
|
||||
In the previous runs, we were doing the inserts sequentially. But in the real world, multiple users interact with the server
|
||||
at the same time. In the next run, we run the same test but in parrallel.
|
||||
|
||||
![Append Parallel](04-append-parallel.png)
|
||||
|
||||
We see 2 spikes: a short one at the beggining, and a longer one at the end.
|
||||
The first spike is probably due to the argon2 decoding, a key derivation function
|
||||
that is purposedly built to be expensive in term of RAM and CPU.
|
||||
The second spike is due to the fact that big emails (multiple MB) are at the end of the dataset,
|
||||
and they are stored fully in RAM before being sent. However, our biggest email weighs 6MB,
|
||||
and we are running 16 threads, so we should expect around a memory usage that is around 100MB,
|
||||
not 400MB. This difference would be a good starting point for an investigation: we might
|
||||
copy a same email multiple times in RAM.
|
||||
|
||||
It seems that in this first test that Aerogramme is particularly sensitive to 1) login commands due to argon2 and 2) large emails.
|
||||
|
||||
### Copy and Move
|
||||
|
||||
You might need to organize your folders, copying or moving your email across your mailboxes.
|
||||
COPY is a standard IMAP command, MOVE is an extension.
|
||||
I will focus on a brutal test: copying 1k emails from the INBOX to Sent, then moving these 1k emails to Archive.
|
||||
Below is the graph depicting Aerogramme resource usage during this test.
|
||||
|
||||
![Copy and move](copy-move.png)
|
||||
|
||||
Memory usage remains stable and low (below 25MB), but the operations are CPU intensive (close to 100% for 40 seconds).
|
||||
Both COPY and MOVE depict the same pattern: indeed, as emails are considered immutable, Aerogramme only handle pointers in both cases
|
||||
and do not really copy their content.
|
||||
|
||||
Real world clients would probably not send such brutal commands, but would do it progressively, either one by one, or with small batches,
|
||||
to keep the UI responsive.
|
||||
|
||||
While CPU optimizations could probably be imagined, I find this behavior satisfying, especially as memory remains stable and low.
|
||||
|
||||
### Setting flags
|
||||
|
||||
Setting flags (Seen, Deleted, Answered, NonJunk, etc.) is done through the STORE command.
|
||||
Our run will be made in 3 parts: 1) putting one flag on one email, 2) putting 16 flags on one email, and 3) putting one flag on 1k emails.
|
||||
The result is depicted in the graph below.
|
||||
|
||||
![Store flags](store.png)
|
||||
|
||||
<!--
|
||||
`STORE` (19 unique commands).
|
||||
UID, not uid, silent, not silent, add not set, standard flags mainly.
|
||||
|
@ -114,76 +211,52 @@ STORE 2 +FLAGS.SILENT \Answered
|
|||
```
|
||||
-->
|
||||
|
||||
In the following, I will keep these 3 categories: **writing**, **notification**, and **query** to evaluate Aerogramme's ressource usage
|
||||
based on command patterns observed on real IMAP commands.
|
||||
The first and last spike are due respectively to the LOGIN/SELECT and CLOSE/LOGOUT commands.
|
||||
We thus have 3 CPU spikes, one for each command, memory remains stable.
|
||||
The last command is bar far the most expensive, and indeed, it has to generate 1k events in our event log and rebuild many things in the index.
|
||||
However, there is no reason for the 2nd command to be less expensive than the first one except from the fact it reuses some ressources / cache entries
|
||||
from the first request.
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Write Commands
|
||||
|
||||
I inserted the full dataset (100 emails) to 16 accounts (the server now handles 1 600 emails then).
|
||||
*[See the script](https://git.deuxfleurs.fr/Deuxfleurs/aerogramme/src/branch/main/tests/instrumentation/mbox-to-imap.py)*
|
||||
|
||||
`APPEND`
|
||||
|
||||
![Append Custom Build](01-append-tokio-console-musl.png)
|
||||
|
||||
|
||||
First, I observed this *scary* linear memory increase. It seems we are not releasing some memory,
|
||||
and that's an issue! I quickly suspected tokio-console of being the culprit.
|
||||
A quick search lead me to an issue entitled [Continuous memory leak with console_subscriber #184](https://github.com/tokio-rs/console/issues/184)
|
||||
that confirmed my intuition.
|
||||
Instead of waiting for an hour or trying to tweak the retention time, I tried a build without tokio console.
|
||||
|
||||
*So in a first approach, we observed the impact of tokio console instead of our code! Still, we want to
|
||||
have performances as predictable as possible.*
|
||||
|
||||
![Append Cargo Release](02-append-glibc.png)
|
||||
|
||||
Which got us to this second pattern: a stable but high memory usage compared to previous run.
|
||||
It appears I built the binary with `cargo release`, which creates a binary that dynamically link to the GNU libc.
|
||||
While the previous binary was made with our custom Nix toolchain that statically compiles the Musl libc.
|
||||
In the process, we changed the allocator: it seems the GNU libc allocator allocates bigger chunks at once.
|
||||
|
||||
*It would be wrong to conclude the musl libc allocator is more efficient: allocating and unallocating
|
||||
memory on the kernel side is costly, and thus it might be better for the allocator to keep some kernel allocated memory
|
||||
for future memory allocations that will not require system calls. This is another example of why this benchmark is wrong: we observe
|
||||
the memory allocated by the allocator, not the memory used by program itself.*
|
||||
|
||||
For the next graph, I removed tokio-console but built Aerogramme with static musl libc.
|
||||
|
||||
![Append Custom Build](03-append-musl.png)
|
||||
|
||||
We observe 16 spikes of memory allocation, around 50MB, followed by a 25MB memory usage. In the end,
|
||||
we drop to ~18MB. We do not try to analyze the spike for now. However, we can assume the 25MB memory usage accounts for the base memory consumption
|
||||
plus the index of the user's mailbox. Once the last user logged out, memory drops to 18MB.
|
||||
In this scenario, a user accounts for around 7MB.
|
||||
|
||||
*We will see later that some other use cases lead to a lower per-user RAM consumption.
|
||||
An hypothesis: we are doing some requests on S3 with the aws-sdk library that is intended to be configured once
|
||||
per process, and handles internally the threading logic. In our case, we instantiate it once per user,
|
||||
tweaking its configuration might help. Again, we are not observing - only - our code!*
|
||||
|
||||
In the previous runs, we were doing the inserts sequentially. But in the real world, multiple users interact with the server
|
||||
at the same time. In the next run, we run the same test but in parrallel.
|
||||
|
||||
![Append Parallel](04-append-parallel.png)
|
||||
|
||||
We see 2 spikes: a short one at the beggining, and a longer one at the end.
|
||||
Interacting with the index is really efficient in term of memory. Generating many changes
|
||||
lead to high CPU (and possibly lot of IO), but from our dataset we observe most changes are done on one or two emails
|
||||
and never on all the mailbox.
|
||||
|
||||
Interacting with flags should not be an issue for Aerogramme in the near future.
|
||||
|
||||
## Notification Commands
|
||||
|
||||
`NOOP` & `CHECK`
|
||||
Notification commands are expected to be run regularly in background by clients.
|
||||
They are particularly sensitive as they are correlated to your number of users,
|
||||
independently of the number of emails they receive. I split them in 2 parts:
|
||||
the ones that are intermittent, and like HTTP, closes the connection after being run,
|
||||
and the ones that are continuous, where the socket is maintained open forever.
|
||||
|
||||
*TODO*
|
||||
### The cost of a refresh
|
||||
|
||||
`STATUS`
|
||||
NOOP, CHECK, STATUS are commands that trigger a refresh of the IMAP
|
||||
view, and are part of the "intermittent" commands. In some ways, the SELECT and/or EXAMINE
|
||||
commands could also be interpreted as a notification command: a client that is configured
|
||||
to poll a mailbox every 15 minutes will not use the NOOP, running EXAMIME will be enough.
|
||||
|
||||
*TODO*
|
||||
In our case, all these commands are similar in the sense that they load or refresh the in-memory index
|
||||
of the targeted mailbox. To illustrate my point, I will run SELECT, NOOP, CHECK, and STATUS on another mailbox in a row.
|
||||
|
||||
`IDLE`
|
||||
![Refresh plot](refresh.png)
|
||||
|
||||
The first CPU spike is LOGIN/SELECT, the second is NOOP, the third CHECK, the last one STATUS.
|
||||
CPU spikes are short, memory usage is stable.
|
||||
|
||||
Refresh commands should not be an issue for Aerogramme in the near future.
|
||||
|
||||
### Continuously connected clients
|
||||
|
||||
IDLE (and NOTIFY that is currently not implemented in Aerogramme) are commands
|
||||
that maintain a socket opened. These commands are sensitive, as while many protocols
|
||||
are one shot, and then your users spread their requests over time, with these commands,
|
||||
all your users are continuously connected.
|
||||
|
||||
In the graph below, we plot the resource usage of 16 users that log into the system,
|
||||
select inbox, and switch to IDLE, then, one by one, they receive an email and are notified.
|
||||
|
||||
![Idle Parallel](05-idle-parallel.png)
|
||||
|
||||
|
|
BIN
content/blog/2024-ram-usage-encryption-s3/refresh.png
Normal file
BIN
content/blog/2024-ram-usage-encryption-s3/refresh.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 23 KiB |
BIN
content/blog/2024-ram-usage-encryption-s3/store.png
Normal file
BIN
content/blog/2024-ram-usage-encryption-s3/store.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 23 KiB |
Loading…
Reference in a new issue