Refine doc

This commit is contained in:
Quentin 2024-01-25 09:57:13 +01:00
parent 6859de3b83
commit 53ef489bf8
Signed by: quentin
GPG key ID: E9602264D639FF68
5 changed files with 27 additions and 16 deletions

View file

@ -5,4 +5,6 @@ sort_by = "weight"
template = "documentation.html"
+++
Test
To help you in the development, you might need:
- [Help debugging the protocol with socat](@/documentation/development/netcat.md)
- [Help finding datasets of email](@/documentation/development/dataset.md)

View file

@ -1,5 +1,5 @@
+++
title = "Debug with netcat"
title = "Debug with socat"
weight = 10
+++
@ -9,14 +9,14 @@ that could help you quickly test/debug Aerogramme.
Start with:
```
nc localhost 1143
socat - tcp:localhost:1143,crlf
```
## Login
```
S: * OK Hello
C: A1 LOGIN alan p455w0rd
C: A1 LOGIN alice hunter2
S: A1 OK Completed
```
@ -60,7 +60,7 @@ C: A6 FETCH 1 (RFC822)
S: * 1 FETCH (UID 1 RFC822 {117}
S: Subject: test
S: From: Alan Smith <alan@smith.me>
S: To: Alan Smith <alan@aerogramme.tld>
S: To: Alan Smith <alice@example.tld>
S:
S: Hello, world!
S: .
@ -78,11 +78,11 @@ S: A7 OK Logout completed
## Full trace
An IMAP trace extracted from Aerogramme:
An (old) IMAP trace extracted from Aerogramme:
```
S: * OK Hello
C: A1 LOGIN alan p455w0rd
C: A1 LOGIN alice hunter2
S: A1 OK Completed
C: A2 SELECT INBOX
S: * 0 EXISTS

View file

@ -12,7 +12,7 @@ with Aerogramme.
The main specification of IMAP is defined in [RFC3501](https://datatracker.ietf.org/doc/html/rfc3501).
It defines 3 main objects: Mailboxes, Emails, and Flags. The following figure depicts how they work together:
![An IMAP mailbox schema](/documentation/design/mailbox.png)
![An IMAP mailbox schema](/documentation/internals/mailbox.png)
Emails are stored ordered inside the mailbox, and for legacy reasons, the mailbox assigns 2 identifiers to each email we name `uid` and `seq`.
@ -40,8 +40,8 @@ Immutable data can be stored directly on Garage, as we do not fear reading an ou
For mutable data, we cannot store them directly in Garage.
Instead, we choose to store a log of operations. Each client then applies this log of operation locally to rebuild its local state.
During this design phase, we noted that the S3 API semantic was too limited for us, so we introduced a second API, K2V, to have more flexibility.
K2V is designed to store and fetch small values in batches, it uses 2 different keys: one to spread the data on the cluster (`P`), and one to sort linked data on the same node (`S`).
During this internals phase, we noted that the S3 API semantic was too limited for us, so we introduced a second API, K2V, to have more flexibility.
K2V is internalsed to store and fetch small values in batches, it uses 2 different keys: one to spread the data on the cluster (`P`), and one to sort linked data on the same node (`S`).
Having data on the same node allows for more efficient queries among this data.
For performance reasons, we plan to introduce 2 optimizations.
@ -49,11 +49,11 @@ First, we store an email summary in K2V that allows fetching multiple entries at
Second, we also store checkpoints of the logs in S3 to avoid keeping and replaying all the logs each time a client starts a session.
We have the following data handled by Garage:
![Aerogramme Datatypes](/documentation/design/aero-states.png)
![Aerogramme Datatypes](/documentation/internals/aero-states.png)
In Garage, it is important to carefully choose the key(s) that are used to store data to have fast queries, we propose the following model:
![Aerogramme Key Choice](/documentation/design/aero-states2.png)
![Aerogramme Key Choice](/documentation/internals/aero-states2.png)

View file

@ -6,7 +6,7 @@ weight = 10
Aerogramme stands at the interface between the Garage storage server, and the user's e-mail client. It provides regular IMAP access on the client-side, and stores encrypted e-mail data on the server-side. Aerogramme also provides an LMTP server interface through which incoming mail can be forwarded by the MTA (e.g. Postfix).
<center>
<img src="/documentation/design/aero-compo.png" alt="Aerogramme components"/>
<img src="/documentation/internals/aero-compo.png" alt="Aerogramme components"/>
<br>
<i>Figure 1: Aerogramme, our IMAP daemon, stores its data encrypted in Garage and provides regular IMAP access to mail clients</i></center>
@ -16,7 +16,7 @@ Aerogramme stands at the interface between the Garage storage server, and the us
Figure 2 below shows an overview of Aerogramme's architecture. Each user has a personal Garage bucket in which to store their mailbox contents. We will document below the details of the components that make up Aerogramme, but let us first provide a high-level overview. The two main classes, `User` and `Mailbox`, define how data is stored in this bucket, and provide a high-level interface with primitives such as reading the message index, loading a mail's content, copying, moving, and deleting messages, etc. This mail storage system is supported by two important primitives: a cryptography management system that provides encryption keys for user's data, and a simple log-like database system inspired by Bayou [1] which we have called Bay, that we use to store the index of messages in each mailbox. The mail storage system is made accessible to the outside world by two subsystems: an LMTP server that allows for incoming mail to be received and stored in a user's bucket, in a staging area, and the IMAP server itself which allows full-fledged manipulation of mailbox data by users.
<center>
<img src="/documentation/design/aero-schema.png" alt="Aerogramme internals"/>
<img src="/documentation/internals/aero-schema.png" alt="Aerogramme internals"/>
<i>Figure 2: Overview of Aerogramme's architecture and internal data structures for a given user, Alice</i></center>
@ -33,7 +33,7 @@ This module can use either of two data sources for user authentication:
The static authentication source can be used in a deployment scenario shown in Figure 3, where Aerogramme is not running on the side of the service provider, but on the user's device itself. In this case, the user can use any password to encrypt their data in the bucket; the only credentials they need for authentication against the service provider are the S3 and K2V API access keys.
<center>
<img src="/documentation/design/aero-paranoid.png" alt="user side encryption" />
<img src="/documentation/internals/aero-paranoid.png" alt="user side encryption" />
<br>
<i>Figure 3: alternative deployment of Aerogramme on the user's device: the service provider never gets access to the plaintext data.</i></center>
@ -57,7 +57,7 @@ To implement the LMTP server, we chose to make use of the `smtp-server` crate fr
The last part that remains to build Aerogramme is to implement the logic behind the IMAP protocol and to link it with the mail storage primitives. We started by implementing a state machine that handled the transitions between the different states in the IMAP protocol: ANONYMOUS (before login), AUTHENTICATED (after login), and SELECTED (once a mailbox has been selected for reading/writing). In the SELECTED state, the IMAP session is linked to a given mailbox of the user. In addition, the IMAP server has to keep track of which updates to the mailbox it has sent (or not) to the client so that it can produce IMAP messages consistent with what the client believes to be in the mailbox. In particular, many IMAP commands make use of mail sequence numbers to identify messages, which are indices in the sorted array of all of the messages in the mailbox. However, if messages are added or removed concurrently, these sequence numbers change: hence we must keep a snapshot of the mailbox's index *as the client knows it*, which is not necessarily the same as what is _actually_ in the mailbox, to generate messages that the client will understand correctly. This snapshot is called a *mailbox view* and is synced regularly with the actual mailbox, at which time the corresponding IMAP updates are sent. This can be done only at specific moments when permitted by the IMAP protocol.
The second part of this task consisted in implementing all of the IMAP protocol commands. Most are relatively straightforward, however, one command, in particular, needed special care: the FETCH command. The FETCH command in the IMAP protocol can return the contents of a message to the client. However, it must also understand precisely the semantics of the content of an e-mail message, as the client can specify very precisely how the message should be returned. For instance, in the case of a multipart message with attachments, the client can emit a FECTH command requesting only a certain attachment of the message to be returned, and not the whole message. To implement such semantics, we have based ourselves on the [`mail-parser`](https://docs.rs/mail-parser/latest/mail_parser/) crate, which can fully parse an RFC822-formatted e-mail message, and also supports some extensions such as MIME. To validate that we were correctly converting the parsed message structure to IMAP messages, we designed a test suite composed of several weirdly shaped e-mail messages, whose IMAP structure definition we extracted by taking Dovecot as a reference. We were then able to compare the output of Aerogramme on these messages with the reference consisting in what was returned by Dovecot.
The second part of this task consisted in implementing all of the IMAP protocol commands. Most are relatively straightforward, however, one command, in particular, needed special care: the FETCH command. The FETCH command in the IMAP protocol can return the contents of a message to the client. However, it must also understand precisely the semantics of the content of an e-mail message, as the client can specify very precisely how the message should be returned. For instance, in the case of a multipart message with attachments, the client can emit a FECTH command requesting only a certain attachment of the message to be returned, and not the whole message. To implement such semantics, we have based ourselves on the [`mail-parser`](https://docs.rs/mail-parser/latest/mail_parser/) crate, which can fully parse an RFC822-formatted e-mail message, and also supports some extensions such as MIME. To validate that we were correctly converting the parsed message structure to IMAP messages, we internalsed a test suite composed of several weirdly shaped e-mail messages, whose IMAP structure definition we extracted by taking Dovecot as a reference. We were then able to compare the output of Aerogramme on these messages with the reference consisting in what was returned by Dovecot.
## References

View file

@ -23,3 +23,12 @@ by Rise Up
## Dovecot obox
*to be written*
## Apache JAMES
*to be written*
## Stalwart IMAP
*to be written*