aerogramme.deuxfleurs.fr/content/documentation/design/imap_uid.md
2023-06-02 12:39:25 +02:00

7.7 KiB

+++ title = "IMAP UID proof" weight = 40 +++

Aerogramme tolerates mailbox desynchronization between active sessions. Desynchronization means that different emails can be assigned the same IMAP UID.

Desynchronization are handled by increasing the IMAP UIDVALIDITY field. When a session learns an operation in a mailbox that conflicts with is current view, it rebuilds its view from the operation logs and compute a new UIDVALIDITY.

This document shows that, considering any mailbox state for a given session, the UIDVALIDITY is increased if a conflict occures. In other words, it proves that clients will never have a corrupted view.

Note that changing the IMAPUIDVALIDITY will trigger a full mailbox resynchronization for the user, so it must be used in last resort. It's a "correctness security net" to ease operations, but in normal conditions, such desynchronizations should not occure. It will be discussed later in the cookbook.

Formal definition of a mailbox

Notations

  • h: the hash of a message, \mathbb{H} is the set of hashes
  • i: the UID of a message (i \in \mathbb{N})
  • f: a flag attributed to a message (it's a string), we write \mathbb{F} the set of possible flags
  • if M is a map (aka a dictionnary), if x has no assigned value in M we write M [x] = \bot or equivalently x \not\in M. If x has a value in the map we write x \in M and M [x] \neq \bot

State

  • A map I such that I [h] is the UID of the message whose hash is h is the mailbox, or \bot if there is no such message

  • A map F such that F [h] is the set of flags attributed to the message whose hash is h

  • v: the UIDVALIDITY value

  • n: the UIDNEXT value

  • s: an internal sequence number that is mostly equal to UIDNEXT but also grows when mails are deleted

Operations

  • MAIL_ADD$(h, i)$: the value of i that is put in this operation is the value of s in the state resulting of all already known operations, i.e. s (O_{gen}) in the notation below where O_{gen} is the set of all operations known at the time when the MAIL_ADD is generated. Moreover, such an operation can only be generated if $I (O_{gen}) [h] = \bot$, i.e. for a mail h that is not already in the state at O_{gen}.

  • MAIL_DEL$(h)$

  • FLAG_ADD$(h, f)$

  • FLAG_DEL$(h, f)$

Mailbox handling algorithms

Add an email

apply MAIL_ADD$(h, i)$:
   if i < s:
     v \leftarrow v + s - i
   if F [h] = \bot:
     F [h] \leftarrow F_{initial}
  $I [h] \leftarrow s$
  $s \leftarrow s + 1$
  $n \leftarrow s$

Delete an email

apply MAIL_DEL$(h)$:
   I [h] \leftarrow \bot
  $F [h] \leftarrow \bot$
  $s \leftarrow s + 1$

Add a flag

apply FLAG_ADD$(h, f)$:
   if h \in F:
     F [h] \leftarrow F [h] \cup \{ f \}

Delete a flag

apply FLAG_DEL$(h, f)$:
   if h \in F:
     F [h] \leftarrow F [h] \backslash \{ f \}

Formal definition of the mailbox log

The mailbox log is a set of operation:

  • o is an operation such as MAIL_ADD, MAIL_DEL, etc. O is a set of operations. Operations embed a timestamp, so a set of operations O can be written as O = [o_1, o_2, \ldots, o_n] by ordering them by timestamp.

  • if o \in O, we write O_{\leqslant o}, O_{< o}, $O_{\geqslant o}$, O_{> o} the set of items of O that are respectively earlier or equal, strictly earlier, later or equal, or strictly later than o. In other words, if we write O = [o_1, \ldots, o_n], where o is a certain o_i in this sequence, then: \begin{aligned} O_{\leqslant o} &= { o_1, \ldots, o_i }\ O_{< o} &= { o_1, \ldots, o_{i - 1} }\ O_{\geqslant o} &= { o_i, \ldots, o_n }\ O_{> o} &= { o_{i + 1}, \ldots, o_n } \end{aligned}

  • If O is a set of operations, we write I (O), F (O), $n (O), s (O)$, and v (O) the values of I, F, n, s and v in the state that results of applying all of the operations in O in their sorted order. (we thus write I (O) [h] the value of I [h] in this state)

Proving that UIDValidity is changed for each conflict

Hypothesis: An operation o can only be in a set O if it was generated after applying operations of a set O_{gen} such that O_{gen} \subset O (because causality is respected in how we deliver operations). Sets of operations that do not respect this property are excluded from all of the properties, lemmas and proofs below.

Simplification: We will now exclude FLAG_ADD and FLAG_DEL operations, as they do not manipulate n, s and v, and adding them should have no impact on the properties below.

Small lemma: If there are no FLAG_ADD and FLAG_DEL operations, then s (O) = | O |. This is easy to see because the possible operations are only MAIL_ADD and MAIL_DEL, and both increment the value of s by 1.

Definition: If o is a MAIL_ADD$(h, i)$ operation, and O is a set of operations such that o \in O, then we define the following value: C (o, O) = s (O_{< o}) - i We say that C (o, O) is the number of conflicts of o in $O$: it corresponds to the number of operations that were added before o in O that were not in O_{gen}.

Property:

We have that:

v (O) = \sum_{o \in O} C (o, O)

Or in English: v (O) is the sum of the number of conflicts of all of the MAIL_ADD operations in O. This is easy to see because indeed v is incremented by C (o, O) for each operation o \in O that is applied.

Property: If O and O' are two sets of operations, and O \subseteq O', then:

\begin{aligned} \forall o \in O, \qquad C (o, O) \leqslant C (o, O') \end{aligned}

This is easy to see because O_{< o} \subseteq O'_{< o} and C (o, O') - C(o, O) = s (O'\_{< o}) - s (O\_{< o}) = | O'\_{< o} | - | O\_{< o} | \geqslant 0

Theorem:

If O and O' are two sets of operations:

\begin{aligned} O \subseteq O' & \Rightarrow & v (O) \leqslant v (O') \end{aligned}

Proof:

\begin{aligned} v (O') &= \sum_{o \in O'} C (o, O')\ & \geqslant \sum_{o \in O} C (o, O') \qquad \text{(because $O \subseteq O'$)}\ & \geqslant \sum_{o \in O} C (o, O) \qquad \text{(because $\forall o \in O, C (o, O) \leqslant C (o, O')$)}\ & \geqslant v (O) \end{aligned}

Theorem:

If O and O' are two sets of operations, such that O \subset O',

and if there are two different mails h and h' (h \neq h') such that $I (O) [h] = I (O') [h']$

then:

$$v (O) < v (O

Proof:

We already know that v (O) \leqslant v (O') because of the previous theorem. We will now look at the sum: v (O') = \sum_{o \in O'} C (o, O') and show that there is at least one term in this sum that is strictly larger than the corresponding term in the other sum: v (O) = \sum_{o \in O} C (o, O) Let o be the last MAIL_ADD{{ katex(body="(h, _)") }} operation in O, i.e. the operation that gives its definitive UID to mail h in O, and similarly o' be the last MAIL_ADD$(h', \_)$ operation in O'.

Let us write I = I (O) [h] = I (O') [h']

o is the operation at position I in O, and o' is the operation at position I in O'. But o \neq o', so if o is not the operation at position I in O' then it has to be at a later position I' > I in O', because no operations are removed between O and O', the only possibility is that some other operations (including o') are added before o. Therefore we have that C (o, O') > C (o, O), i.e. at least one term in the sum above is strictly larger in the first sum than in the second one. Since all other terms are greater or equal, we have v (O') > v (O).