garagehq.deuxfleurs.fr/content/blog/2022-perf/index.md

4.7 KiB

+++ title="Bringing theoretical design and real word performances face to face" date=2022-09-26 +++

For the past years, we have extensively analyzed possible design decisions and their theoretical tradeoffs on Garage, being it on the network, data structure, or scheduling side. And it worked well enough for our production cluster at Deuxfleurs, but we also knew that people started discovering some unexpected behaviors. We thus started a round of benchmark and performance improvement to make Garage more versatile and better understand what we can expect from it.


⚠️ Disclaimer

The following results must be taken with a critical grain of salt due to some limitations that are inherent to any benchmark. We try to reference them in this section, some limitations might be missing.

Most of our tests are done on simulated networks that can not represent all the diversity of real networks (dynamic drop, jitter, latency, all of them could possibly be correlated with throughput or any other external event). We also limited ourselves to very small workloads that are not representative of a production cluster.

For some benchmarks, we used Minio as a reference. It must be noted that we did not try to optimize its configuration as we have done on Garage, and more generally, we have way less knowledge on Minio than on Garage, which can lead to underrated performance measurements for Minio. It must also be noted that Garage and Minio are systems with different feature sets, eg. Minio supports erasure coding for better data density while Garage doesn't, Minio implements way more S3 endpoints than Garage, etc. Such feature have necessarily a cost that you must keep in mind when reading plots.

Impact of the testing environment is also not evaluated (kernel patches, configuration, parameters, filesystem, hardware configuration, etc.), some of these configurations could favor one configuration/software over another. Especially, it must be noted that most of the tests were done on a consumer-grade computer and SSD only, which will be different from most production setups. Finally, our results are also provided without statistical tests to check their significance, and thus might be statistically not significative.

When reading this post, please keep in mind that we are not making any business or technical recommendation here, we only share bits of our development process. Read benchmarking crimes, make your own tests if you need to take a decision, and remain supportive and caring with your peers...

About our testing environment

We started a batch of tests on Grid5000, a large-scale and flexible testbed for experiment-driven research in all areas of computer science, under the Open Access program. During our tests, we used part of the following clusters: nova, paravance, and econome to make a geo-distributed topology. We used the Grid5000 testbed only during our preliminary tests to identify issues when running Garage on many powerful servers, issues that we then reproduced in a controlled environment; don't be surprised then if Grid5000 is not mentioned often on our plots.

To reproduce some environments locally, we have a small set of Python scripts named mknet tailored to our needs1. Most of the following tests where thus run locally with mknet on a single computer: a Dell Inspiron 27" 7775 AIO, with a Ryzen 5 1400, 16GB of RAM, a 512GB SSD. In term of software, NixOS 22.05 with the 5.15.50 kernel is used with an ext4 encrypted filesystem. The vm.dirty_background_ratio and vm.dirty_ratio have been reduce to 2 and 1 respectively as, otherwise, the system tends to freeze when it is under heavy I/O load.

Efficient I/O

  • streaming

  • fsync, semaphore, timeouts, etc.

Million of objects

  • metadata engine

  • storing metadata at scale

Topology versatility

  • low bandwidth

  • high network latency. phenomenon we name amplification

  • complexity (constant time)

Future work

  • srpt

  • better analysis of the fsync / data reliability impact

  • analysis and comparison of Garage at scale

  • try to better understand ecosystem (riak cs, minio, ceph, swift) -> some knowledge to get


  1. Yes, we are aware of Jepsen existence. This tool is far more complex than our set of scripts, but we know that it is also way more versatile. ↩︎