diff --git a/liveness.md b/liveness.md index bec4ea4..c33bd05 100644 --- a/liveness.md +++ b/liveness.md @@ -42,9 +42,30 @@ It starts to really look like a congestion control/control flow error/scheduler ## Write a custom client exhibiting the issue -We know how to trigger the issue with `warp`, Minio's benchmark tool but we don't yet understand well what kind of load it puts on the cluster except that it sends concurrently PUT and Multipart requests. So, before investigating the issue more in depth, we want to know: +We know how to trigger the issue with `warp`, Minio's benchmark tool but we don't yet understand well what kind of load it puts on the cluster except that it sends concurrently Multipart and PutObject requests concurrently. So, before investigating the issue more in depth, we want to know: - If a single large PUT request can trigger this issue or not? - How many parallel requests are needed to trigger this issue? - Does Multipart transfer are more impacted by this issue? To get answer to our questions, we will write a specific benchmark. +Named s3concurrent, it is available here: https://git.deuxfleurs.fr/quentin/s3concurrent +The benchmark starts by sending 1 file, then 2 files concurrently, +then 3, then 4, up to 16 (this is hardcoded for now). + +When ran on our mknet cluster, we start triggering issues as soon as we send 2 files at once: + +``` +$ ./s3concurrent +2022/08/11 20:35:28 created bucket 3ffd6798-bdab-4218-b6d0-973a07e46ea9 +2022/08/11 20:35:28 start concurrent loop with 1 coroutines +2022/08/11 20:35:55 done, 1 coroutines returned +2022/08/11 20:35:55 start concurrent loop with 2 coroutines +2022/08/11 20:36:34 1/2 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Timeout", "Timeout"] +2022/08/11 20:36:37 done, 2 coroutines returned +2022/08/11 20:36:37 start concurrent loop with 3 coroutines +2022/08/11 20:37:13 1/3 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Netapp error: Not connected: 92c7fb74ed89f289", "Netapp error: Not connected: 3cb7ed98f7c66a55"] +2022/08/11 20:37:51 2/3 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Netapp error: Not connected: 92c7fb74ed89f289", "Netapp error: Not connected: 3cb7ed98f7c66a55"] +2022/08/11 20:37:51 3/3 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Netapp error: Not connected: 92c7fb74ed89f289", "Netapp error: Not connected: 3cb7ed98f7c66a55"] +2022/08/11 20:37:51 done, 3 coroutines returned +2022/08/11 20:37:51 start concurrent loop with 4 coroutines +```