7.7 KiB
+++ title = "IMAP UID proof" weight = 40 +++
Aerogramme tolerates mailbox desynchronization between active sessions. Desynchronization means that different emails can be assigned the same IMAP UID.
Desynchronization are handled by increasing the IMAP UIDVALIDITY field. When a session learns an operation in a mailbox that conflicts with is current view, it rebuilds its view from the operation logs and compute a new UIDVALIDITY.
This document shows that, considering any mailbox state for a given session, the UIDVALIDITY is increased if a conflict occures. In other words, it proves that clients will never have a corrupted view.
Note that changing the IMAPUIDVALIDITY will trigger a full mailbox resynchronization for the user, so it must be used in last resort. It's a "correctness security net" to ease operations, but in normal conditions, such desynchronizations should not occure. It will be discussed later in the cookbook.
Formal definition of a mailbox
Notations
h
: the hash of a message,\mathbb{H}
is the set of hashesi
: the UID of a message(i \in \mathbb{N})
f
: a flag attributed to a message (it's a string), we write\mathbb{F}
the set of possible flags- if
M
is a map (aka a dictionnary), ifx
has no assigned value inM
we writeM [x] = \bot
or equivalentlyx \not\in M
. Ifx
has a value in the map we writex \in M
andM [x] \neq \bot
State
-
A map
I
such thatI [h]
is the UID of the message whose hash ish
is the mailbox, or\bot
if there is no such message -
A map
F
such thatF [h]
is the set of flags attributed to the message whose hash ish
-
v
: the UIDVALIDITY value -
n
: the UIDNEXT value -
s
: an internal sequence number that is mostly equal to UIDNEXT but also grows when mails are deleted
Operations
-
MAIL_ADD$(h, i)$: the value of
i
that is put in this operation is the value ofs
in the state resulting of all already known operations, i.e.s (O_{gen})
in the notation below whereO_{gen}
is the set of all operations known at the time when the MAIL_ADD is generated. Moreover, such an operation can only be generated if $I (O_{gen}) [h] = \bot$, i.e. for a mailh
that is not already in the state atO_{gen}
. -
MAIL_DEL$(h)$
-
FLAG_ADD$(h, f)$
-
FLAG_DEL$(h, f)$
Mailbox handling algorithms
Add an email
apply MAIL_ADD$(h, i)$:
if i < s
:
v \leftarrow v + s - i
if F [h] = \bot
:
F [h] \leftarrow F_{initial}
$I [h] \leftarrow s$
$s \leftarrow s + 1$
$n \leftarrow s$
Delete an email
apply MAIL_DEL$(h)$:
I [h] \leftarrow \bot
$F [h] \leftarrow \bot$
$s \leftarrow s + 1$
Add a flag
apply FLAG_ADD$(h, f)$:
if h \in F
:
F [h] \leftarrow F [h] \cup \{ f \}
Delete a flag
apply FLAG_DEL$(h, f)$:
if h \in F
:
F [h] \leftarrow F [h] \backslash \{ f \}
Formal definition of the mailbox log
The mailbox log is a set of operation:
-
o
is an operation such as MAIL_ADD, MAIL_DEL, etc.O
is a set of operations. Operations embed a timestamp, so a set of operationsO
can be written asO = [o_1, o_2, \ldots, o_n]
by ordering them by timestamp. -
if
o \in O
, we writeO_{\leqslant o}
,O_{< o}
, $O_{\geqslant o}$,O_{> o}
the set of items ofO
that are respectively earlier or equal, strictly earlier, later or equal, or strictly later thano
. In other words, if we writeO = [o_1, \ldots, o_n]
, whereo
is a certaino_i
in this sequence, then:\begin{aligned} O_{\leqslant o} &= { o_1, \ldots, o_i }\ O_{< o} &= { o_1, \ldots, o_{i - 1} }\ O_{\geqslant o} &= { o_i, \ldots, o_n }\ O_{> o} &= { o_{i + 1}, \ldots, o_n } \end{aligned}
-
If
O
is a set of operations, we writeI (O)
,F (O)
, $n (O), s (O)$, andv (O)
the values ofI, F, n, s
andv
in the state that results of applying all of the operations inO
in their sorted order. (we thus writeI (O) [h]
the value ofI [h]
in this state)
Proving that UIDValidity is changed for each conflict
Hypothesis:
An operation o
can only be in a set O
if it was
generated after applying operations of a set O_{gen}
such that
O_{gen} \subset O
(because causality is respected in how we deliver
operations). Sets of operations that do not respect this property are excluded
from all of the properties, lemmas and proofs below.
Simplification: We will now exclude FLAG_ADD and FLAG_DEL
operations, as they do not manipulate n
, s
and v
, and adding them should
have no impact on the properties below.
Small lemma: If there are no FLAG_ADD and FLAG_DEL operations,
then s (O) = | O |
. This is easy to see because the possible operations are
only MAIL_ADD and MAIL_DEL, and both increment the value of s
by 1.
Definition: If o
is a MAIL_ADD$(h, i)$ operation, and O
is a
set of operations such that o \in O
, then we define the following value:
C (o, O) = s (O_{< o}) - i
We say that
C (o, O)
is the number of conflicts of o
in $O$: it
corresponds to the number of operations that were added before o
in O
that
were not in O_{gen}
.
Property:
We have that:
v (O) = \sum_{o \in O} C (o, O)
Or in English: v (O)
is the sum of the number of conflicts of all of the
MAIL_ADD operations in O
. This is easy to see because indeed v
is
incremented by C (o, O)
for each operation o \in O
that is applied.
Property:
If O
and O'
are two sets of operations, and O \subseteq O'
, then:
\begin{aligned}
\forall o \in O, \qquad C (o, O) \leqslant C (o, O')
\end{aligned}
This is easy to see because O_{< o} \subseteq O'_{< o}
and
C (o, O') - C(o, O) = s (O'\_{< o}) - s (O\_{< o}) = | O'\_{< o} | - | O\_{< o} | \geqslant 0
Theorem:
If O
and O'
are two sets of operations:
\begin{aligned}
O \subseteq O' & \Rightarrow & v (O) \leqslant v (O')
\end{aligned}
Proof:
\begin{aligned}
v (O') &= \sum_{o \in O'} C (o, O')\
& \geqslant \sum_{o \in O} C (o, O') \qquad \text{(because $O \subseteq
O'$)}\
& \geqslant \sum_{o \in O} C (o, O) \qquad \text{(because $\forall o \in
O, C (o, O) \leqslant C (o, O')$)}\
& \geqslant v (O)
\end{aligned}
Theorem:
If O
and O'
are two sets of operations, such that O \subset O'
,
and if there are two different mails h
and h'
(h \neq h')
such that $I
(O) [h] = I (O') [h']$
then:
$$v (O) < v (O
Proof:
We already know that v (O) \leqslant v (O')
because of the previous theorem.
We will now look at the sum:
v (O') = \sum_{o \in O'} C (o, O')
and show that there is at least one term in this sum that is strictly larger
than the corresponding term in the other sum:
v (O) = \sum_{o \in O} C (o, O)
Let
o
be the last MAIL_ADD{{ katex(body="(h, _)") }} operation in O
, i.e. the operation
that gives its definitive UID to mail h
in O
, and similarly o'
be the
last MAIL_ADD$(h', \_)$ operation in O'
.
Let us write I = I (O) [h] = I (O') [h']
o
is the operation at position I
in O
, and o'
is the operation at
position I
in O'
. But o \neq o'
, so if o
is not the operation at
position I
in O'
then it has to be at a later position I' > I
in O'
,
because no operations are removed between O
and O'
, the only possibility
is that some other operations (including o'
) are added before o
. Therefore
we have that C (o, O') > C (o, O)
, i.e. at least one term in the sum above
is strictly larger in the first sum than in the second one. Since all other
terms are greater or equal, we have v (O') > v (O)
.