More proofreading
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing

This commit is contained in:
Maximilien 2022-07-07 19:22:06 +02:00
parent 046bef611c
commit 2cfc65befc

View file

@ -46,7 +46,7 @@ kubernetes_skip_crd = true
If you want to try Garage on K8S, we currently only provide some basic [example files](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/7e1ac51b580afa8e900206e7cc49791ed0a00d94/script/k8s). These files register a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/), a [ClusterRoleBinding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-and-clusterrolebinding), and a [StatefulSet](https://kubernetes.io/fr/docs/concepts/workloads/controllers/statefulset/) with a [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/).
Once these files deployed, you will be able to interact with Garage as follow:
Once these files are deployed, you will be able to interact with Garage as follow:
```bash
kubectl exec -it garage-0 --container garage -- /garage status
@ -59,14 +59,14 @@ kubectl exec -it garage-0 --container garage -- /garage status
You can then follow the [regular documentation](https://garagehq.deuxfleurs.fr/documentation/cookbook/real-world/#creating-a-cluster-layout) to complete the configuration of your cluster.
If you target a production deployment, you should avoid binding admin rights to your cluster to create Garage's CRD. You will also need to expose some [Services](https://kubernetes.io/docs/concepts/services-networking/service/) to make your cluster reachable. Keep also in mind that Garage is a stateful service, so you must be very careful of how you handle your data in Kubernetes in order not to lose it. In the near future, we plan to release a proper Helm chart and write "best practises" in our documentation.
If you target a production deployment, you should avoid binding admin rights to your cluster to create Garage's CRD. You will also need to expose some [Services](https://kubernetes.io/docs/concepts/services-networking/service/) to make your cluster reachable. Keep also in mind that Garage is a stateful service, so you must be very careful of how you handle your data in Kubernetes in order not to lose it. In the near future, we plan to release a proper Helm chart and write "best practices" in our documentation.
If Kubernetes is not your thing, know that we are running Garage on a Nomad+Consul cluster, which is also well supported.
We have not documented it yet but you can get a look at [our Nomad service](https://git.deuxfleurs.fr/Deuxfleurs/infrastructure/src/commit/1e5e4af35c073d04698bb10dd4ad1330d6c62a0d/app/garage/deploy/garage.hcl).
## OpenTelemetry support
[OpenTelemetry](https://opentelemetry.io/) standardizes how software generates and collects system telemetry information, namely metrics, logs and traces.
[OpenTelemetry](https://opentelemetry.io/) standardizes how software generates and collects system telemetry information, namely metrics, logs, and traces.
By implementing this standard in Garage, we hope that it will help you to better monitor, manage and tune your cluster.
Note that to fully leverage this feature, you must be already familiar with monitoring stacks like [Prometheus](https://prometheus.io/)+[Grafana](https://grafana.com/) or [ElasticSearch](https://www.elastic.co/elasticsearch/)+[Kibana](https://www.elastic.co/kibana/).
@ -87,7 +87,7 @@ It includes a docker-compose file and a pre-configured Grafana dashboard.
You can use them if you want to reproduce the following examples.
Grafana is particularly adapted to understand how your cluster is performing from a "bird's eye view".
For example, the following graph shows S3 API calls sent to your node per time-unit.
For example, the following graph shows S3 API calls sent to your node per time unit.
You can use it to better understand how your users are interacting with your cluster.
![A screenshot of a plot made by Grafana depicting the number of requests per time units grouped by endpoints](api_rate.png)
@ -95,21 +95,21 @@ You can use it to better understand how your users are interacting with your clu
Thanks to this graph, we know that starting at 14:55, an important upload has been started.
This upload is made of many small files, as we see many PutObject calls that are often used for small files.
It also has some large objects, as we observe some multipart uploads requests.
Conversely, at this time, no read are done as the corresponding read enpoints (ListBuckets, ListObjectsV2, etc.) receive 0 request per time unit.
Conversely, at this time, no reads are done as the corresponding read endpoints (ListBuckets, ListObjectsV2, etc.) receive 0 request per time unit.
Garage also collects metrics from lower level parts of the system.
Garage also collects metrics from lower-level parts of the system.
You can use them to better understand how Garage is interacting with your OS and your hardware.
![A screenshot of a plot made by Grafana depicting the write speed (in MB/s) during time.](writes.png)
![A screenshot of a plot made by Grafana depicting the write speed (in MB/s) during the test time.](writes.png)
This plot has been captured at the same moment than the previous one.
This plot has been captured at the same moment as the previous one.
We do not see a correlation between the writes and the API requests for the full upload but only for its beginning.
More precisely, it maps well to multipart upload requests, and this is expected.
Large files (of the multipart uploads) will saturate the writes of your disk but the uploading of small files (via the PutObject endpoint) will be limited by other parts of the system.
This simple example covers only 2 metrics over the 20+ ones that we already defined, but it still allowed us to precisely describe our cluster usage and identify where bottlenecks could be.
We are confident that cleverly using these metrics on a production cluster will give you many more valuable insights on your cluster.
We are confident that cleverly using these metrics on a production cluster will give you many more valuable insights into your cluster.
While metrics are good for having a large, general overview of your system, they are however not adapted for digging and pinpointing a specific performance issue on a specific code path.
Thankfully, we also have a solution for this problem: tracing.
@ -128,9 +128,9 @@ Consequently, this request probably corresponds to a very small file.
Below this first histogram, you can select the request you want to inspect, and then see its trace on the bottom part.
The trace shown above can be broken down in 4 parts: fetching the API key to check authentication (`key get`), fetching the bucket identifier from its name (`bucket_alias get`), fetching the bucket configuration to check authorizations (`bucket_v2 get`), and finally inserting the object in the storage (`object insert`).
With this example, we demonstrated that we can inspect Garage internals to find slow requests, then see which codepath has been taken by a request, and finally to identify which part of the code took time.
With this example, we demonstrated that we can inspect Garage internals to find slow requests, then see which codepath has been taken by a request, and finally identify which part of the code took time.
Keep in mind that this is our first iteration on telemetry for Garage, so things are a bit rough around the edges (step by step documentation is missing, our Grafana dashboard is a work in a progress, etc.).
Keep in mind that this is our first iteration on telemetry for Garage, so things are a bit rough around the edges (step-by-step documentation is missing, our Grafana dashboard is a work in progress, etc.).
In all cases, your feedback is welcome on our Matrix channel.