87 lines
3.4 KiB
Markdown
87 lines
3.4 KiB
Markdown
# NetText
|
|
|
|
A text-based data format for cryptographic network protocols.
|
|
|
|
## Principles
|
|
|
|
- Only uses a limited subset of ASCII characters
|
|
- Has a minimal set of fundamental data types
|
|
- Retains the raw representation of complex data structures for hashing and cryptographic signing
|
|
- Minimal value data type: a string type that can only be used to represent identifiers, numbers and base64-encoded byte strings.
|
|
|
|
## Fundamental types
|
|
|
|
A term can be of any of the following kinds:
|
|
|
|
- a string, which may contain only ASCII alphanumeric characters and a limited subset of other ASCII characters that may not include characters used to represent other kinds of terms
|
|
- a dict, which maps strings (as defined above) to any term type
|
|
- a list, which may contain any number of any kind of terms (can be mixed)
|
|
- a sequence, consistuted of at least two of the above (can be mixed), simply separated by whitespace; sequences cannot be nested
|
|
|
|
Dicts are represented as follows:
|
|
|
|
```
|
|
{
|
|
key1 = value1;
|
|
key2 = value2
|
|
}
|
|
```
|
|
|
|
Lists are represented as follows:
|
|
|
|
```
|
|
[ term1; term2 ]
|
|
```
|
|
|
|
Sequences are represented as follows:
|
|
|
|
```
|
|
term1 term2 term3
|
|
```
|
|
|
|
As a consequence, complex data structures can be defined as follows:
|
|
|
|
```
|
|
SEND MESSAGE {
|
|
topic = blah;
|
|
to = [
|
|
TOPIC hello;
|
|
USER john
|
|
],
|
|
body = blah blah
|
|
}
|
|
```
|
|
|
|
The raw representation of a parsed dict or sequence is retained for hashing purposes.
|
|
It in the sequence of bytes, in the encoded string, trimmed from whitespace at extremities,
|
|
that represents the encoded dict or sequence in that string.
|
|
|
|
In the complex stance example above, here are the sequence and dicts and their raw representation:
|
|
|
|
- the toplevel term is a sequence, whose raw representation is the entire encoded string (assuming no whitespace at beginning or end)
|
|
- the third term of the sequence is a dict, whose raw representation starts at `{` and ends at `}`
|
|
- the second mapping of the dict is a list, whose raw representation starts at `[` and ends at `]`
|
|
- the third mapping of the dict is a sequence, whose raw representation is exactly `blah blah`.
|
|
|
|
Since strings cannot contain whitespace, they are always equivalent to their raw representation.
|
|
|
|
## Structural mappings
|
|
|
|
Terms can be interpreted in a number of different ways, depending on the context:
|
|
|
|
- RAW: the term is interpreted as its raw encoding (see above)
|
|
- STRING: if the term is a string or a sequence composed exclusively of strings, the term is interpreted as its raw encoding
|
|
- VARIANT: if the term is a sequence whose first item is a string, it is interpreted as a variant with the following properties:
|
|
- a discriminator (the first item)
|
|
- a value, which is either the second item in case there are only two items, or the sequence composed of all items starting from the second if there are more than two
|
|
- DICT: if the term is a dict, interpret it as such
|
|
- LIST: if the term is a list, interpret it as such
|
|
- SEQ: if the term is a string, a list, or a dict, interpret it as a sequence composed of that single term. Otherwise, the term is a sequence, interpret it as a sequence of terms.
|
|
|
|
## Data mappings
|
|
|
|
Terms further have mappings as different data types:
|
|
|
|
- BYTES: if the term maps as a STRING, decode it using base64
|
|
- INT: if the term maps as a STRING, decode it as an integer written in decimal notation
|
|
- HASH, PUBKEY, SECKEY, SIGNATURE, ENCKEY, DECKEY, SYMKEY: a bunch of things that interpret BYTES as specific cryptographic items
|