2023-04-18 12:14:13 +02:00
|
|
|
# jepsen.garage
|
|
|
|
|
2023-04-18 17:47:53 +02:00
|
|
|
Jepsen checking of Garage consistency properties.
|
2023-04-18 12:14:13 +02:00
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
2023-04-18 17:47:53 +02:00
|
|
|
Requirements:
|
|
|
|
|
|
|
|
- vagrant
|
2023-12-14 16:23:48 +01:00
|
|
|
- VirtualBox, configured so that nodes can take an IP in a private network `192.168.56.0/24` (it's the default)
|
2023-04-18 17:47:53 +02:00
|
|
|
- a user that can create VirtualBox VMs
|
|
|
|
- leiningen
|
|
|
|
- gnuplot
|
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Set up VMs before running tests:
|
2023-04-18 17:47:53 +02:00
|
|
|
|
|
|
|
```
|
|
|
|
vagrant up
|
|
|
|
```
|
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Run tests: see commands below.
|
2023-04-18 17:47:53 +02:00
|
|
|
|
2023-04-19 16:16:34 +02:00
|
|
|
|
2023-10-19 14:34:19 +02:00
|
|
|
## Results
|
|
|
|
|
2023-10-19 23:40:55 +02:00
|
|
|
### Register linear, without timestamp patch
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 20 --workload reg1 --ops-per-key 100`
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
Results without timestamp patch:
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
- Fails with a simple clock-scramble nemesis (`--scenario c`).
|
|
|
|
Explanation: without the timestamp patch, nodes will create objects using their
|
|
|
|
local clock only as a timestamp, so the ordering will be all over the place if
|
|
|
|
clocks are scrambled.
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
Results with timestamp patch (`--patch tsfix2`):
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
- No failure with clock-scramble nemesis
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
- Fails with clock-scramble nemesis + partition nemesis (`--scenario cp`).
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
**This test is expected to fail.**
|
|
|
|
Indeed, S3 objects are not meant to behave like linearizable registers.
|
|
|
|
TODO explain using a counter-example
|
2023-10-19 14:34:19 +02:00
|
|
|
|
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
### Read-after-write CRDT register model
|
2023-10-20 12:56:45 +02:00
|
|
|
|
|
|
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload reg2 --ops-per-key 100`
|
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
Results without timestamp patch:
|
2023-10-20 12:56:45 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
- Fails with a simple clock-scramble nemesis (`--scenario c`).
|
|
|
|
Explanation: old values are not overwritten correctly when their timestamps are in the future.
|
2023-10-20 12:56:45 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
Results with timestamp patch (`--patch tsfix2`):
|
2023-10-20 12:56:45 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
- No failures with clock-scramble nemesis + partition nemesis (`--scenario cp`).
|
|
|
|
This proves that `tsfix2` (PR#543) does improve consistency.
|
2023-10-20 12:56:45 +02:00
|
|
|
|
2023-10-24 15:44:05 +02:00
|
|
|
- **Fails with layout reconfiguration nemesis** (`--scenario r`).
|
|
|
|
Example of a failed run: `garage reg2/20231024T120806.899+0200`.
|
2023-10-24 11:39:45 +02:00
|
|
|
This is the failure mode we are looking for and trying to fix for NLnet task 3.
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Results with NLnet task 3 code (commit 707442f5de, `--patch task3a`):
|
|
|
|
|
|
|
|
- No failures with `--scenario r` (0 of 10 runs), `--scenario pr` (0 of 10 runs),
|
2023-11-16 18:09:13 +01:00
|
|
|
`--scenario cpr` (0 of 10 runs) and `--scenario dpr` (0 of 10 runs).
|
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
- Same with `--patch task3c` (commit `0041b013`, the final version).
|
|
|
|
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-19 23:40:55 +02:00
|
|
|
### Set, basic test (write some items, then read)
|
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 200 --concurrency 200 --workload set1 --ops-per-key 100`
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Results without NLnet task3 code (`--patch tsfix2`):
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
- For now, no failures with clock-scramble nemesis + partition nemesis -> TODO long test run
|
|
|
|
|
2023-10-24 17:45:22 +02:00
|
|
|
- Does not seem to fail with only the layout reconfiguation nemesis (<10 runs), although theoretically it could
|
2023-10-24 11:39:45 +02:00
|
|
|
|
2023-10-24 17:45:22 +02:00
|
|
|
- **Fails with the partition + layout reconfiguration nemesis** (`--scenario pr`).
|
2023-10-25 12:13:27 +02:00
|
|
|
Example of a failed run: `garage set1/20231024T172214.488+0200` (1 failure in 4 runs).
|
2023-10-25 14:04:39 +02:00
|
|
|
This is the failure mode we are looking for and trying to fix for NLnet task 3.
|
2023-10-24 11:39:45 +02:00
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Results with NLnet task 3 code (commit 707442f5de, `--patch task3a`):
|
|
|
|
|
|
|
|
- The tests are buggy and often result in an "unknown" validity status, which
|
|
|
|
is caused by some requests not returning results during network partitions or
|
|
|
|
other nemesis-induced broken cluster states. However, when the tests were
|
|
|
|
able to finish, there were no failures with scenarios `r`, `pr`, `cpr`,
|
|
|
|
`dpr`.
|
|
|
|
|
2023-10-19 23:40:55 +02:00
|
|
|
|
|
|
|
### Set, continuous test (interspersed reads and writes)
|
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload set2 --ops-per-key 100`
|
2023-10-24 11:39:45 +02:00
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Results without NLnet task3 code (`--patch tsfix2`):
|
2023-10-24 11:39:45 +02:00
|
|
|
|
2023-10-25 14:43:24 +02:00
|
|
|
- No failures with clock-scramble nemesis + db nemesis + partition nemesis (`--scenario cdp`) (0 failures in 10 runs).
|
2023-10-24 11:39:45 +02:00
|
|
|
|
2023-10-25 14:43:24 +02:00
|
|
|
- **Fails with just layout reconfiguration nemesis** (`--scenario r`).
|
|
|
|
Example of a failed run: `garage set2/20231025T141940.198+0200` (10 failures in 10 runs).
|
2023-10-25 12:13:27 +02:00
|
|
|
This is the failure mode we are looking for and trying to fix for NLnet task 3.
|
2023-10-19 23:40:55 +02:00
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
Results with NLnet task3 code (commit 707442f5de, `--patch task3a`):
|
|
|
|
|
|
|
|
- No failures with `--scenario r` (0 of 10 runs), `--scenario pr` (0 of 10 runs),
|
2023-11-16 18:09:13 +01:00
|
|
|
`--scenario cpr` (0 of 10 runs) and `--scenario dpr` (0 of 10 runs).
|
|
|
|
|
2023-12-14 16:23:48 +01:00
|
|
|
- Same with `--patch task3c` (commit `0041b013`, the final version).
|
|
|
|
|
|
|
|
|
|
|
|
## NLnet task 3 final results
|
|
|
|
|
|
|
|
- With code from task3 (`--patch task3c`): [reg2 and set2](results/Results-2023-12-13-task3c.png), [set1](results/Results-2023-12-14-task3-set1.png).
|
|
|
|
- Without (`--patch tsfix2`): [reg2 and set2](results/Results-2023-12-13-tsfix2.png), set1 TBD.
|
2023-10-19 23:40:55 +02:00
|
|
|
|
2023-10-20 13:36:48 +02:00
|
|
|
## Investigating (and fixing) errors
|
2023-10-19 23:40:55 +02:00
|
|
|
|
|
|
|
### Segfaults
|
|
|
|
|
|
|
|
They are due to the download being interrupted in the middle (^C during first launch on clean VMs), the `garage` binary is truncated.
|
2023-12-14 16:23:48 +01:00
|
|
|
Add `:force?` to the `cached-wget!` call in `daemon.clj` to re-download the binary,
|
|
|
|
or restar the VMs to clear temporary files.
|
2023-10-19 23:40:55 +02:00
|
|
|
|
|
|
|
### In `jepsen.garage`: prefix wierdness
|
2023-10-19 16:45:24 +02:00
|
|
|
|
|
|
|
In `store/garage set1/20231019T163358.615+0200`:
|
|
|
|
|
|
|
|
```
|
|
|
|
INFO [2023-10-19 16:35:20,977] clojure-agent-send-off-pool-207 - jepsen.garage.set list results for prefix set20/ : (set13/0 set13/1 set13/10 set13/11 set13/12 set13/13 set13/14 set13/15 set13/16 set13/17 set13/18 set13/19 set13/2 set13/20 set13/21 set13/22 set13/23 set13/24 set13/25 set13/26 set13/27 set13/28 set13/29 set13/3 set13/30 set13/31 set13/32 set13/33 set13/34 set13/35 set13/36 set13/37 set13/38 set13/39 set13/4 set13/40 set13/41 set13/42 set13/43 set13/44 set13/45 set13/46 set13/47 set13/48 set13/49 set13/5 set13/50 set13/51 set13/52 set13/53 set13/54 set13/55 set13/56 set13/57 set13/58 set13/59 set13/6 set13/60 set13/61 set13/62 set13/63 set13/64 set13/65 set13/66 set13/67 set13/68 set13/69 set13/7 set13/70 set13/71 set13/72 set13/73 set13/74 set13/75 set13/76 set13/77 set13/78 set13/79 set13/8 set13/80 set13/81 set13/82 set13/83 set13/84 set13/85 set13/86 set13/87 set13/88 set13/89 set13/9 set13/90 set13/91 set13/92 set13/93 set13/94 set13/95 set13/96 set13/97 set13/98 set13/99) (node: http://192.168.56.25:3900 )
|
|
|
|
```
|
|
|
|
|
2023-10-19 23:40:55 +02:00
|
|
|
After inspecting, the actual S3 call made was with prefix "set13/", so at least this is not an error in Garage itself but in the jepsen code.
|
|
|
|
|
|
|
|
Finally found out that this was due to closures not correctly capturing their context in the list function in s3api.clj (wtf clojure?)
|
|
|
|
Not sure exactly where it came from but it seems to have been fixed by making list-inner a separate function and not a sub-function,
|
|
|
|
and passing all values that were previously in the context (creds and prefix) as additional arguments.
|
2023-10-19 14:34:19 +02:00
|
|
|
|
2023-10-20 13:36:48 +02:00
|
|
|
### `reg2` test inconsistency, even with timestamp fix
|
|
|
|
|
|
|
|
The reg2 test is our custom checker for CRDT read-after-write on individual object keys, acting as registers which can be updated.
|
|
|
|
The test fails without the timestamp fix, which is expected as the clock scrambler will prevent nodes from having a correct ordering of objects.
|
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
With the timestamp fix (`--patch tsfix1`), the happenned-before relationship should at least be respected, meaning that when a PutObject call starts
|
2023-10-20 13:36:48 +02:00
|
|
|
after another PutObject call has ended, the second call should overwrite the value of the first call, and that value should not be
|
|
|
|
readable by future GetObject calls.
|
|
|
|
However, we observed inconsistencies even with the timestamp fix.
|
|
|
|
|
|
|
|
The inconsistencies seemed to always happenned after writing a nil value, which translates to a DeleteObject call
|
|
|
|
instead of a PutObject. By removing the possibility of writing nil values, therefore only doing
|
|
|
|
PutObject calls, the issue disappears. There is therefore an issue to fix in DeleteObject.
|
|
|
|
|
2023-10-24 11:39:45 +02:00
|
|
|
The issue in DeleteObject seems to have been fixed by commit `c82d91c6bccf307186332b6c5c6fc0b128b1b2b1`, which can be used using `--patch tsfix2`.
|
2023-10-20 13:36:48 +02:00
|
|
|
|
|
|
|
|
2023-04-18 12:14:13 +02:00
|
|
|
## License
|
|
|
|
|
2023-04-18 17:47:53 +02:00
|
|
|
Copyright © 2023 Alex Auvolat
|
2023-04-18 12:14:13 +02:00
|
|
|
|
|
|
|
This program and the accompanying materials are made available under the
|
2023-04-19 12:56:40 +02:00
|
|
|
terms of the GNU Affero General Public License v3.0.
|