From 53ef489bf854360a94da969f62e43444c43ece29 Mon Sep 17 00:00:00 2001 From: Quentin Dufour Date: Thu, 25 Jan 2024 09:57:13 +0100 Subject: [PATCH] Refine doc --- content/documentation/development/_index.md | 4 +++- content/documentation/development/netcat.md | 12 ++++++------ content/documentation/internals/mailbox.md | 10 +++++----- content/documentation/internals/overview.md | 8 ++++---- content/documentation/internals/related-work.md | 9 +++++++++ 5 files changed, 27 insertions(+), 16 deletions(-) diff --git a/content/documentation/development/_index.md b/content/documentation/development/_index.md index c954b77..1b7ad82 100644 --- a/content/documentation/development/_index.md +++ b/content/documentation/development/_index.md @@ -5,4 +5,6 @@ sort_by = "weight" template = "documentation.html" +++ -Test +To help you in the development, you might need: + - [Help debugging the protocol with socat](@/documentation/development/netcat.md) + - [Help finding datasets of email](@/documentation/development/dataset.md) diff --git a/content/documentation/development/netcat.md b/content/documentation/development/netcat.md index 8d7a54b..4e42263 100644 --- a/content/documentation/development/netcat.md +++ b/content/documentation/development/netcat.md @@ -1,5 +1,5 @@ +++ -title = "Debug with netcat" +title = "Debug with socat" weight = 10 +++ @@ -9,14 +9,14 @@ that could help you quickly test/debug Aerogramme. Start with: ``` -nc localhost 1143 +socat - tcp:localhost:1143,crlf ``` ## Login ``` S: * OK Hello -C: A1 LOGIN alan p455w0rd +C: A1 LOGIN alice hunter2 S: A1 OK Completed ``` @@ -60,7 +60,7 @@ C: A6 FETCH 1 (RFC822) S: * 1 FETCH (UID 1 RFC822 {117} S: Subject: test S: From: Alan Smith -S: To: Alan Smith +S: To: Alan Smith S: S: Hello, world! S: . @@ -78,11 +78,11 @@ S: A7 OK Logout completed ## Full trace -An IMAP trace extracted from Aerogramme: +An (old) IMAP trace extracted from Aerogramme: ``` S: * OK Hello -C: A1 LOGIN alan p455w0rd +C: A1 LOGIN alice hunter2 S: A1 OK Completed C: A2 SELECT INBOX S: * 0 EXISTS diff --git a/content/documentation/internals/mailbox.md b/content/documentation/internals/mailbox.md index d6c2c4a..4809fce 100644 --- a/content/documentation/internals/mailbox.md +++ b/content/documentation/internals/mailbox.md @@ -12,7 +12,7 @@ with Aerogramme. The main specification of IMAP is defined in [RFC3501](https://datatracker.ietf.org/doc/html/rfc3501). It defines 3 main objects: Mailboxes, Emails, and Flags. The following figure depicts how they work together: -![An IMAP mailbox schema](/documentation/design/mailbox.png) +![An IMAP mailbox schema](/documentation/internals/mailbox.png) Emails are stored ordered inside the mailbox, and for legacy reasons, the mailbox assigns 2 identifiers to each email we name `uid` and `seq`. @@ -40,8 +40,8 @@ Immutable data can be stored directly on Garage, as we do not fear reading an ou For mutable data, we cannot store them directly in Garage. Instead, we choose to store a log of operations. Each client then applies this log of operation locally to rebuild its local state. -During this design phase, we noted that the S3 API semantic was too limited for us, so we introduced a second API, K2V, to have more flexibility. -K2V is designed to store and fetch small values in batches, it uses 2 different keys: one to spread the data on the cluster (`P`), and one to sort linked data on the same node (`S`). +During this internals phase, we noted that the S3 API semantic was too limited for us, so we introduced a second API, K2V, to have more flexibility. +K2V is internalsed to store and fetch small values in batches, it uses 2 different keys: one to spread the data on the cluster (`P`), and one to sort linked data on the same node (`S`). Having data on the same node allows for more efficient queries among this data. For performance reasons, we plan to introduce 2 optimizations. @@ -49,11 +49,11 @@ First, we store an email summary in K2V that allows fetching multiple entries at Second, we also store checkpoints of the logs in S3 to avoid keeping and replaying all the logs each time a client starts a session. We have the following data handled by Garage: -![Aerogramme Datatypes](/documentation/design/aero-states.png) +![Aerogramme Datatypes](/documentation/internals/aero-states.png) In Garage, it is important to carefully choose the key(s) that are used to store data to have fast queries, we propose the following model: -![Aerogramme Key Choice](/documentation/design/aero-states2.png) +![Aerogramme Key Choice](/documentation/internals/aero-states2.png) diff --git a/content/documentation/internals/overview.md b/content/documentation/internals/overview.md index cef045a..309ad6d 100644 --- a/content/documentation/internals/overview.md +++ b/content/documentation/internals/overview.md @@ -6,7 +6,7 @@ weight = 10 Aerogramme stands at the interface between the Garage storage server, and the user's e-mail client. It provides regular IMAP access on the client-side, and stores encrypted e-mail data on the server-side. Aerogramme also provides an LMTP server interface through which incoming mail can be forwarded by the MTA (e.g. Postfix).
-Aerogramme components +Aerogramme components
Figure 1: Aerogramme, our IMAP daemon, stores its data encrypted in Garage and provides regular IMAP access to mail clients
@@ -16,7 +16,7 @@ Aerogramme stands at the interface between the Garage storage server, and the us Figure 2 below shows an overview of Aerogramme's architecture. Each user has a personal Garage bucket in which to store their mailbox contents. We will document below the details of the components that make up Aerogramme, but let us first provide a high-level overview. The two main classes, `User` and `Mailbox`, define how data is stored in this bucket, and provide a high-level interface with primitives such as reading the message index, loading a mail's content, copying, moving, and deleting messages, etc. This mail storage system is supported by two important primitives: a cryptography management system that provides encryption keys for user's data, and a simple log-like database system inspired by Bayou [1] which we have called Bay, that we use to store the index of messages in each mailbox. The mail storage system is made accessible to the outside world by two subsystems: an LMTP server that allows for incoming mail to be received and stored in a user's bucket, in a staging area, and the IMAP server itself which allows full-fledged manipulation of mailbox data by users.
-Aerogramme internals +Aerogramme internals Figure 2: Overview of Aerogramme's architecture and internal data structures for a given user, Alice
@@ -33,7 +33,7 @@ This module can use either of two data sources for user authentication: The static authentication source can be used in a deployment scenario shown in Figure 3, where Aerogramme is not running on the side of the service provider, but on the user's device itself. In this case, the user can use any password to encrypt their data in the bucket; the only credentials they need for authentication against the service provider are the S3 and K2V API access keys.
-user side encryption +user side encryption
Figure 3: alternative deployment of Aerogramme on the user's device: the service provider never gets access to the plaintext data.
@@ -57,7 +57,7 @@ To implement the LMTP server, we chose to make use of the `smtp-server` crate fr The last part that remains to build Aerogramme is to implement the logic behind the IMAP protocol and to link it with the mail storage primitives. We started by implementing a state machine that handled the transitions between the different states in the IMAP protocol: ANONYMOUS (before login), AUTHENTICATED (after login), and SELECTED (once a mailbox has been selected for reading/writing). In the SELECTED state, the IMAP session is linked to a given mailbox of the user. In addition, the IMAP server has to keep track of which updates to the mailbox it has sent (or not) to the client so that it can produce IMAP messages consistent with what the client believes to be in the mailbox. In particular, many IMAP commands make use of mail sequence numbers to identify messages, which are indices in the sorted array of all of the messages in the mailbox. However, if messages are added or removed concurrently, these sequence numbers change: hence we must keep a snapshot of the mailbox's index *as the client knows it*, which is not necessarily the same as what is _actually_ in the mailbox, to generate messages that the client will understand correctly. This snapshot is called a *mailbox view* and is synced regularly with the actual mailbox, at which time the corresponding IMAP updates are sent. This can be done only at specific moments when permitted by the IMAP protocol. -The second part of this task consisted in implementing all of the IMAP protocol commands. Most are relatively straightforward, however, one command, in particular, needed special care: the FETCH command. The FETCH command in the IMAP protocol can return the contents of a message to the client. However, it must also understand precisely the semantics of the content of an e-mail message, as the client can specify very precisely how the message should be returned. For instance, in the case of a multipart message with attachments, the client can emit a FECTH command requesting only a certain attachment of the message to be returned, and not the whole message. To implement such semantics, we have based ourselves on the [`mail-parser`](https://docs.rs/mail-parser/latest/mail_parser/) crate, which can fully parse an RFC822-formatted e-mail message, and also supports some extensions such as MIME. To validate that we were correctly converting the parsed message structure to IMAP messages, we designed a test suite composed of several weirdly shaped e-mail messages, whose IMAP structure definition we extracted by taking Dovecot as a reference. We were then able to compare the output of Aerogramme on these messages with the reference consisting in what was returned by Dovecot. +The second part of this task consisted in implementing all of the IMAP protocol commands. Most are relatively straightforward, however, one command, in particular, needed special care: the FETCH command. The FETCH command in the IMAP protocol can return the contents of a message to the client. However, it must also understand precisely the semantics of the content of an e-mail message, as the client can specify very precisely how the message should be returned. For instance, in the case of a multipart message with attachments, the client can emit a FECTH command requesting only a certain attachment of the message to be returned, and not the whole message. To implement such semantics, we have based ourselves on the [`mail-parser`](https://docs.rs/mail-parser/latest/mail_parser/) crate, which can fully parse an RFC822-formatted e-mail message, and also supports some extensions such as MIME. To validate that we were correctly converting the parsed message structure to IMAP messages, we internalsed a test suite composed of several weirdly shaped e-mail messages, whose IMAP structure definition we extracted by taking Dovecot as a reference. We were then able to compare the output of Aerogramme on these messages with the reference consisting in what was returned by Dovecot. ## References diff --git a/content/documentation/internals/related-work.md b/content/documentation/internals/related-work.md index 9793126..33530c2 100644 --- a/content/documentation/internals/related-work.md +++ b/content/documentation/internals/related-work.md @@ -23,3 +23,12 @@ by Rise Up ## Dovecot obox +*to be written* + +## Apache JAMES + +*to be written* + +## Stalwart IMAP + +*to be written*