lx/nettext

A text-based data format for cryptographic network protocols.

Find a file

Alex Auvolat 31c20c3a03 reduce unnecessary re-exports		2023-11-15 01:46:48 +01:00
src	reduce unnecessary re-exports	2023-11-15 01:46:48 +01:00
.gitignore	add cargo.lock	2023-11-15 00:18:46 +01:00
Cargo.lock	reduce unnecessary re-exports	2023-11-15 01:46:48 +01:00
Cargo.toml	reduce unnecessary re-exports	2023-11-15 01:46:48 +01:00
README.md	simplify byte encodings, use prefixes for crypto data types	2023-11-15 01:35:21 +01:00

README.md

NetText

A text-based data format for cryptographic network protocols.

Principles

Only uses a limited subset of ASCII characters
Has a minimal set of fundamental data types
Retains the raw representation of complex data structures for hashing and cryptographic signing
Minimal value data type: a string type that can only be used to represent identifiers, numbers and base64-encoded byte strings.

Fundamental types

A term can be of any of the following kinds:

a string, which may contain only ASCII alphanumeric characters and a limited subset of other ASCII characters that may not include characters used to represent other kinds of terms
a dict, which maps strings (as defined above) to any term type
a list, which may contain any number of any kind of terms (can be mixed)
a sequence, consistuted of at least two of the above (can be mixed), simply separated by whitespace; sequences cannot be nested

Dicts are represented as follows:

{
    key1 = value1;
    key2 = value2
}

Lists are represented as follows:

[ term1; term2 ]

Sequences are represented as follows:

term1 term2 term3

As a consequence, complex data structures can be defined as follows:

SEND MESSAGE {
    topic = blah;
    to = [
        TOPIC hello;
        USER john
    ],
    body = blah blah
}

The raw representation of a parsed dict or sequence is retained for hashing purposes. It in the sequence of bytes, in the encoded string, trimmed from whitespace at extremities, that represents the encoded dict or sequence in that string.

In the complex stance example above, here are the sequence and dicts and their raw representation:

the toplevel term is a sequence, whose raw representation is the entire encoded string (assuming no whitespace at beginning or end)
the third term of the sequence is a dict, whose raw representation starts at { and ends at }
the second mapping of the dict is a list, whose raw representation starts at [ and ends at ]
the third mapping of the dict is a sequence, whose raw representation is exactly blah blah.

Since strings cannot contain whitespace, they are always equivalent to their raw representation.

Structural mappings

Terms can be interpreted in a number of different ways, depending on the context:

RAW: the term is interpreted as its raw encoding (see above)
STRING: if the term is a string or a sequence composed exclusively of strings, the term is interpreted as its raw encoding
VARIANT: if the term is a sequence whose first item is a string, it is interpreted as a variant with the following properties:
- a discriminator (the first item)
- a value, which is either the second item in case there are only two items, or the sequence composed of all items starting from the second if there are more than two
DICT: if the term is a dict, interpret it as such
LIST: if the term is a list, interpret it as such
SEQ: if the term is a string, a list, or a dict, interpret it as a sequence composed of that single term. Otherwise, the term is a sequence, interpret it as a sequence of terms.

Data mappings

Terms further have mappings as different data types:

INT: if the term maps as a STRING, decode it as an integer written in decimal notation
BYTES: if the term maps as a STRING, decode it using base64. Since a STRING cannot be empty, the string - is used to represent an empty byte string.
Cryptographic data types (see below)

Cryptographic data types

Cryptographic values such as keys, hashes, signatures, etc. are encoded as STRING with a prefix indicating the algorithm used, followed by ":", followed by the base64-encoded value.

Prefixes are as follows:

pk.box: public key for NaCl's box API
sk.box: secret key for NaCl's box API
sk.sbox: secret key for NaCl's secretbox API
h.sha256: sha256 hash
h.sha512: sha512 hash
h.sha3: sha3 hash
h.b2: blake2b hash
h.b3: blake3 hash
sig.ed25519: ed25519 signature
pk.ed25519: ed25519 public signing key
sk.ed25519: ed25519 secret signing key

More can be added.

HASH, PUBKEY, SECKEY, SIGNATURE, ENCKEY, DECKEY, SYMKEY: a bunch of things that interpret BYTES as specific cryptographic items