Jepsen testing (NLnet task 3 subtask 1) #544
5 changed files with 85 additions and 36 deletions
|
@ -35,55 +35,74 @@ lein run test --nodes-file nodes.vagrant --time-limit 64 --rate 50 --concurrenc
|
||||||
|
|
||||||
### Register linear, without timestamp patch
|
### Register linear, without timestamp patch
|
||||||
|
|
||||||
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 20 --concurrency 20 --workload reg1 --ops-per-key 100`
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 20 --workload reg1 --ops-per-key 100`
|
||||||
|
|
||||||
Results: fails with a simple clock-scramble nemesis.
|
Results without timestamp patch:
|
||||||
|
|
||||||
Explanation: without the timestamp patch, nodes will create objects using their
|
- Fails with a simple clock-scramble nemesis (`--scenario c`).
|
||||||
local clock only as a timestamp, so the ordering will be all over the place if
|
Explanation: without the timestamp patch, nodes will create objects using their
|
||||||
clocks are scrambled.
|
local clock only as a timestamp, so the ordering will be all over the place if
|
||||||
|
clocks are scrambled.
|
||||||
|
|
||||||
### Register linear, with timestamp patch
|
Results with timestamp patch (`--patch tsfix2`):
|
||||||
|
|
||||||
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 20 --concurrency 20 --workload reg1 --ops-per-key 100 --patch tsfix1`
|
|
||||||
|
|
||||||
Results:
|
|
||||||
|
|
||||||
- No failure with clock-scramble nemesis
|
- No failure with clock-scramble nemesis
|
||||||
- Fails with clock-scramble nemesis + partition nemesis
|
|
||||||
|
|
||||||
Explanation: S3 objects are not meant to behave like linearizable registers. TODO explain using a counter-example
|
- Fails with clock-scramble nemesis + partition nemesis (`--scenario cp`).
|
||||||
|
|
||||||
### Read-after-write CRDT register model, without timestamp patch
|
**This test is expected to fail.**
|
||||||
|
Indeed, S3 objects are not meant to behave like linearizable registers.
|
||||||
|
TODO explain using a counter-example
|
||||||
|
|
||||||
|
|
||||||
|
### Read-after-write CRDT register model
|
||||||
|
|
||||||
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload reg2 --ops-per-key 100`
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload reg2 --ops-per-key 100`
|
||||||
|
|
||||||
Results: fails with a simple clock-scramble nemesis.
|
Results without timestamp patch:
|
||||||
|
|
||||||
Explanation: old values are not overwritten correctly when their timestamps are in the future.
|
- Fails with a simple clock-scramble nemesis (`--scenario c`).
|
||||||
|
Explanation: old values are not overwritten correctly when their timestamps are in the future.
|
||||||
|
|
||||||
### Read-after-write CRDT register model, with timestamp patch (v2 with DeleteObject fix as well)
|
Results with timestamp patch (`--patch tsfix2`):
|
||||||
|
|
||||||
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload reg2 --ops-per-key 100 --patch tsfix2`
|
- No failures with clock-scramble nemesis + partition nemesis (`--scenario cp`).
|
||||||
|
This proves that `tsfix2` (PR#543) does improve consistency.
|
||||||
|
|
||||||
Results:
|
- **Fails with layout reconfiguration nemesis** (`--scenario r`)
|
||||||
|
(TODO: note down the run id of a failed run)
|
||||||
- No failures with clock-scramble nemesis + partition nemesis
|
(TODO: test more and investigate).
|
||||||
- Fails with layout reconfiguration nemesis (TODO: test more and investigate)
|
This is the failure mode we are looking for and trying to fix for NLnet task 3.
|
||||||
|
|
||||||
|
|
||||||
### Set, basic test (write some items, then read)
|
### Set, basic test (write some items, then read)
|
||||||
|
|
||||||
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload set1 --ops-per-key 100`
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload set1 --ops-per-key 100 --patch tsfix2`
|
||||||
|
|
||||||
Results:
|
Results:
|
||||||
|
|
||||||
- For now, no failures with clock-scramble nemesis + partition nemesis
|
- For now, no failures with clock-scramble nemesis + partition nemesis -> TODO long test run
|
||||||
- TODO: layout reconfiguration nemesis (does not fail yet! but it should)
|
|
||||||
|
- Failures were not yet achieved with only the layout reconfiguration nemesis, although they should be.
|
||||||
|
|
||||||
|
- **Fails with partition + layout reconfiguration nemesis** (`--scenario pr`)
|
||||||
|
(TODO: note down the run id of a failed run)
|
||||||
|
(TODO: test more and investigate).
|
||||||
|
This is the failure mode we are looking for and trying to fix for NLnet task 3.
|
||||||
|
|
||||||
|
|
||||||
### Set, continuous test (interspersed reads and writes)
|
### Set, continuous test (interspersed reads and writes)
|
||||||
|
|
||||||
TODO
|
Command: `lein run test --nodes-file nodes.vagrant --time-limit 60 --rate 100 --concurrency 100 --workload set2 --ops-per-key 100 --patch tsfix2`
|
||||||
|
|
||||||
|
Results:
|
||||||
|
|
||||||
|
- For now, no failures with clock-scramble nemesis + partition nemesis -> TODO long test run
|
||||||
|
|
||||||
|
- Failures were not yet achieved with only the layout reconfiguration nemesis, although they should be.
|
||||||
|
|
||||||
|
- TODO: failures should be achieved with `--scenario pr`? Even with 4 or 5 consecutive test runs, no failures were achieved, why?
|
||||||
|
(TODO: note down the run id of a failed run)
|
||||||
|
|
||||||
|
|
||||||
## Investigating (and fixing) errors
|
## Investigating (and fixing) errors
|
||||||
|
@ -112,7 +131,7 @@ and passing all values that were previously in the context (creds and prefix) as
|
||||||
The reg2 test is our custom checker for CRDT read-after-write on individual object keys, acting as registers which can be updated.
|
The reg2 test is our custom checker for CRDT read-after-write on individual object keys, acting as registers which can be updated.
|
||||||
The test fails without the timestamp fix, which is expected as the clock scrambler will prevent nodes from having a correct ordering of objects.
|
The test fails without the timestamp fix, which is expected as the clock scrambler will prevent nodes from having a correct ordering of objects.
|
||||||
|
|
||||||
With the timestamp fix, the happenned-before relationship should at least be respected, meaning that when a PutObject call starts
|
With the timestamp fix (`--patch tsfix1`), the happenned-before relationship should at least be respected, meaning that when a PutObject call starts
|
||||||
after another PutObject call has ended, the second call should overwrite the value of the first call, and that value should not be
|
after another PutObject call has ended, the second call should overwrite the value of the first call, and that value should not be
|
||||||
readable by future GetObject calls.
|
readable by future GetObject calls.
|
||||||
However, we observed inconsistencies even with the timestamp fix.
|
However, we observed inconsistencies even with the timestamp fix.
|
||||||
|
@ -121,7 +140,7 @@ The inconsistencies seemed to always happenned after writing a nil value, which
|
||||||
instead of a PutObject. By removing the possibility of writing nil values, therefore only doing
|
instead of a PutObject. By removing the possibility of writing nil values, therefore only doing
|
||||||
PutObject calls, the issue disappears. There is therefore an issue to fix in DeleteObject.
|
PutObject calls, the issue disappears. There is therefore an issue to fix in DeleteObject.
|
||||||
|
|
||||||
The issue in DeleteObject seems to have been fixed by commit `c82d91c6bccf307186332b6c5c6fc0b128b1b2b1`
|
The issue in DeleteObject seems to have been fixed by commit `c82d91c6bccf307186332b6c5c6fc0b128b1b2b1`, which can be used using `--patch tsfix2`.
|
||||||
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
|
@ -23,8 +23,10 @@
|
||||||
|
|
||||||
(def scenari
|
(def scenari
|
||||||
"A map of scenari to the associated nemesis"
|
"A map of scenari to the associated nemesis"
|
||||||
{"cp" grgNemesis/scenario-cp
|
{"c" grgNemesis/scenario-c
|
||||||
"r" grgNemesis/scenario-r})
|
"cp" grgNemesis/scenario-cp
|
||||||
|
"r" grgNemesis/scenario-r
|
||||||
|
"pr" grgNemesis/scenario-pr})
|
||||||
|
|
||||||
(def patches
|
(def patches
|
||||||
"A map of patch names to Garage builds"
|
"A map of patch names to Garage builds"
|
||||||
|
|
|
@ -7,6 +7,8 @@
|
||||||
[jepsen.garage.daemon :as grg]
|
[jepsen.garage.daemon :as grg]
|
||||||
[jepsen.control.util :as cu]))
|
[jepsen.control.util :as cu]))
|
||||||
|
|
||||||
|
; ---- reconfiguration nemesis ----
|
||||||
|
|
||||||
(defn configure-present!
|
(defn configure-present!
|
||||||
"Configure node to be active in new cluster layout"
|
"Configure node to be active in new cluster layout"
|
||||||
[test node]
|
[test node]
|
||||||
|
@ -61,8 +63,18 @@
|
||||||
|
|
||||||
(teardown! [this test] this)))
|
(teardown! [this test] this)))
|
||||||
|
|
||||||
|
; ---- nemesis scenari ----
|
||||||
|
|
||||||
|
(defn scenario-c
|
||||||
|
"Clock scramble scenario"
|
||||||
|
[opts]
|
||||||
|
{:generator (cycle [(gen/sleep 5)
|
||||||
|
{:type :info, :f :clock-scramble}])
|
||||||
|
:nemesis (nemesis/compose
|
||||||
|
{{:clock-scramble :scramble} (nemesis/clock-scrambler 20.0)})})
|
||||||
|
|
||||||
(defn scenario-cp
|
(defn scenario-cp
|
||||||
"Clock scramble + parittion scenario"
|
"Clock scramble + partition scenario"
|
||||||
[opts]
|
[opts]
|
||||||
{:generator (cycle [(gen/sleep 5)
|
{:generator (cycle [(gen/sleep 5)
|
||||||
{:type :info, :f :partition-start}
|
{:type :info, :f :partition-start}
|
||||||
|
@ -91,3 +103,23 @@
|
||||||
:nemesis (nemesis/compose
|
:nemesis (nemesis/compose
|
||||||
{{:reconfigure-start :start
|
{{:reconfigure-start :start
|
||||||
:reconfigure-stop :stop} (reconfigure-subset 3)})})
|
:reconfigure-stop :stop} (reconfigure-subset 3)})})
|
||||||
|
|
||||||
|
(defn scenario-pr
|
||||||
|
"Partition + cluster reconfiguration scenario"
|
||||||
|
[opts]
|
||||||
|
{:generator (cycle [(gen/sleep 3)
|
||||||
|
{:type :info, :f :reconfigure-start}
|
||||||
|
(gen/sleep 3)
|
||||||
|
{:type :info, :f :partition-start}
|
||||||
|
(gen/sleep 3)
|
||||||
|
{:type :info, :f :reconfigure-start}
|
||||||
|
(gen/sleep 3)
|
||||||
|
{:type :info, :f :partition-stop}
|
||||||
|
(gen/sleep 3)
|
||||||
|
{:type :info, :f :reconfigure-stop}])
|
||||||
|
:final-generator (gen/once {:type :info, :f :partition-stop})
|
||||||
|
:nemesis (nemesis/compose
|
||||||
|
{{:partition-start :start
|
||||||
|
:partition-stop :stop} (nemesis/partition-random-halves)
|
||||||
|
{:reconfigure-start :start
|
||||||
|
:reconfigure-stop :stop} (reconfigure-subset 3)})})
|
||||||
|
|
|
@ -39,12 +39,10 @@
|
||||||
new-object-summaries (:object-summaries list-result)
|
new-object-summaries (:object-summaries list-result)
|
||||||
new-objects (map (fn [d] (:key d)) new-object-summaries)
|
new-objects (map (fn [d] (:key d)) new-object-summaries)
|
||||||
objects (concat new-objects accum)]
|
objects (concat new-objects accum)]
|
||||||
(info (:endpoint creds) "ListObjectsV2 prefix(" prefix "), ct(" ct "): " new-objects)
|
|
||||||
(if (:truncated? list-result)
|
(if (:truncated? list-result)
|
||||||
(list-inner creds prefix (:next-continuation-token list-result) objects)
|
(list-inner creds prefix (:next-continuation-token list-result) objects)
|
||||||
objects)))
|
objects)))
|
||||||
(defn list
|
(defn list
|
||||||
"Helper for ListObjects -- just lists everything in the bucket"
|
"Helper for ListObjects -- just lists everything in the bucket"
|
||||||
[creds prefix]
|
[creds prefix]
|
||||||
(info "in s3/list creds:" creds ", prefix:" prefix)
|
|
||||||
(list-inner creds prefix nil []))
|
(list-inner creds prefix nil []))
|
||||||
|
|
|
@ -45,9 +45,7 @@
|
||||||
10000
|
10000
|
||||||
(assoc op :type :fail, :error ::timeout)
|
(assoc op :type :fail, :error ::timeout)
|
||||||
(do
|
(do
|
||||||
(info "call s3/list creds: " (:creds this) ", prefix:" prefix)
|
|
||||||
(let [items (s3/list (:creds this) prefix)]
|
(let [items (s3/list (:creds this) prefix)]
|
||||||
(info "list results for prefix" prefix ":" items " (node:" (:endpoint (:creds this)) ")")
|
|
||||||
(let [items-stripped (map (fn [o]
|
(let [items-stripped (map (fn [o]
|
||||||
(assert (str/starts-with? o prefix))
|
(assert (str/starts-with? o prefix))
|
||||||
(str/replace-first o prefix "")) items)
|
(str/replace-first o prefix "")) items)
|
||||||
|
@ -115,8 +113,8 @@
|
||||||
{:client (SetClient. nil)
|
{:client (SetClient. nil)
|
||||||
:checker (independent/checker
|
:checker (independent/checker
|
||||||
(checker/compose
|
(checker/compose
|
||||||
{:set-full (checker/set-full {:linearizable? false})
|
{:set-read-after-write (set-read-after-write)
|
||||||
:set-read-after-write (set-read-after-write)
|
; :set-full (checker/set-full {:linearizable? false})
|
||||||
:timeline (timeline/html)}))
|
:timeline (timeline/html)}))
|
||||||
:generator (independent/concurrent-generator
|
:generator (independent/concurrent-generator
|
||||||
10
|
10
|
||||||
|
|
Loading…
Reference in a new issue