Add some doc about our own bench tool

This commit is contained in:
Quentin 2022-08-11 22:16:01 +02:00
parent 86ab9d7c00
commit 430259d050
Signed by: quentin
GPG key ID: E9602264D639FF68

View file

@ -42,9 +42,30 @@ It starts to really look like a congestion control/control flow error/scheduler
## Write a custom client exhibiting the issue
We know how to trigger the issue with `warp`, Minio's benchmark tool but we don't yet understand well what kind of load it puts on the cluster except that it sends concurrently PUT and Multipart requests. So, before investigating the issue more in depth, we want to know:
We know how to trigger the issue with `warp`, Minio's benchmark tool but we don't yet understand well what kind of load it puts on the cluster except that it sends concurrently Multipart and PutObject requests concurrently. So, before investigating the issue more in depth, we want to know:
- If a single large PUT request can trigger this issue or not?
- How many parallel requests are needed to trigger this issue?
- Does Multipart transfer are more impacted by this issue?
To get answer to our questions, we will write a specific benchmark.
Named s3concurrent, it is available here: https://git.deuxfleurs.fr/quentin/s3concurrent
The benchmark starts by sending 1 file, then 2 files concurrently,
then 3, then 4, up to 16 (this is hardcoded for now).
When ran on our mknet cluster, we start triggering issues as soon as we send 2 files at once:
```
$ ./s3concurrent
2022/08/11 20:35:28 created bucket 3ffd6798-bdab-4218-b6d0-973a07e46ea9
2022/08/11 20:35:28 start concurrent loop with 1 coroutines
2022/08/11 20:35:55 done, 1 coroutines returned
2022/08/11 20:35:55 start concurrent loop with 2 coroutines
2022/08/11 20:36:34 1/2 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Timeout", "Timeout"]
2022/08/11 20:36:37 done, 2 coroutines returned
2022/08/11 20:36:37 start concurrent loop with 3 coroutines
2022/08/11 20:37:13 1/3 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Netapp error: Not connected: 92c7fb74ed89f289", "Netapp error: Not connected: 3cb7ed98f7c66a55"]
2022/08/11 20:37:51 2/3 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Netapp error: Not connected: 92c7fb74ed89f289", "Netapp error: Not connected: 3cb7ed98f7c66a55"]
2022/08/11 20:37:51 3/3 failed with Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Netapp error: Not connected: 92c7fb74ed89f289", "Netapp error: Not connected: 3cb7ed98f7c66a55"]
2022/08/11 20:37:51 done, 3 coroutines returned
2022/08/11 20:37:51 start concurrent loop with 4 coroutines
```