*We just published Garage v0.7, our second public beta release. In this post, we do a quick tour of its 2 new features: Kubernetes integration and OpenTelemetry support.*
Two months ago, we were impressed by the success of our open beta launch at FOSDEM and on Hacker News: [our initial post](https://garagehq.deuxfleurs.fr/blog/2022-introducing-garage/) lead to more than 40k views in 10 days, peaking at 100 views/minute, and all requests were served by Garage, without even using a caching frontend!
Since this event, we continued to improve Garage, and — 2 months after the initial release — we are happy to announce version 0.7.0.
But first, we would like to thank the contributors that made this new release possible: Alex, Jill, Max Audron, Maximilien, Quentin, Rune Henrisken, Steam, and trinity-1686a.
This is also our first time welcoming contributors external to the core team, and as we wish for Garage to be a community-driven project, we encourage it!
We ship [statically compiled binaries](https://garagehq.deuxfleurs.fr/download/) for most common Linux architectures (amd64, i386, aarch64 and armv6) and associated [Docker containers](https://hub.docker.com/u/dxflrs).
Garage now is also packaged by third parties on some OS/distributions. We are currently aware of [FreeBSD](https://cgit.freebsd.org/ports/tree/www/garage/Makefile) and [AUR for Arch Linux](https://aur.archlinux.org/packages/garage).
Feel free to [reach out to us](mailto:garagehq@deuxfleurs.fr) if you are packaging (or planning to package) Garage; we welcome maintainers and will upstream specific patches if that can help. If you already did package Garage, please inform us and we'll add it to the documentation.
We listed them in our [changelogs](https://git.deuxfleurs.fr/Deuxfleurs/garage/releases), so take a look, we might have fixed some issues you were having!
Besides bug fixes, there are two new major features in this release: better integration with Kubernetes, and support for observability via OpenTelemetry.
Before Garage v0.7.0, you had to deploy a Consul cluster or spawn a "coordinating" pod to deploy Garage on [Kubernetes](https://kubernetes.io) (K8S).
In this new version, Garage integrates a method to discover other peers by using Kubernetes [Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) (CR) to simplify cluster discovery.
CR discovery can be quickly enabled in Garage, by configuring the name of the desired service (`kubernetes_namespace`) and which namespace to look for (`kubernetes_service_name`) in your Garage configuration file:
Custom Resources must be defined *a priori* with [Custom Resource Definition](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) (CRD).
If the CRD does not exist, Garage will create it for you. Automatic CRD creation is enabled by default, but it requires giving additional permissions to Garage to work.
If you prefer strictly controlling access to your K8S cluster, you can create the resource manually and prevent Garage from automatically creating it:
If you want to try Garage on K8S, we currently only provide some basic [example files](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/7e1ac51b580afa8e900206e7cc49791ed0a00d94/script/k8s). These files register a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/), a [ClusterRoleBinding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-and-clusterrolebinding), and a [StatefulSet](https://kubernetes.io/fr/docs/concepts/workloads/controllers/statefulset/) with a [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/).
You can then follow the [regular documentation](https://garagehq.deuxfleurs.fr/documentation/cookbook/real-world/#creating-a-cluster-layout) to complete the configuration of your cluster.
If you target a production deployment, you should avoid binding admin rights to your cluster to create Garage's CRD. You will also need to expose some [Services](https://kubernetes.io/docs/concepts/services-networking/service/) to make your cluster reachable. Keep also in mind that Garage is a stateful service, so you must be very careful of how you handle your data in Kubernetes in order not to lose it. In the near future, we plan to release a proper Helm chart and write "best practices" in our documentation.
We have not documented it yet but you can get a look at [our Nomad service](https://git.deuxfleurs.fr/Deuxfleurs/infrastructure/src/commit/1e5e4af35c073d04698bb10dd4ad1330d6c62a0d/app/garage/deploy/garage.hcl).
[OpenTelemetry](https://opentelemetry.io/) standardizes how software generates and collects system telemetry information, namely metrics, logs, and traces.
By implementing this standard in Garage, we hope that it will help you to better monitor, manage and tune your cluster.
Note that to fully leverage this feature, you must be already familiar with monitoring stacks like [Prometheus](https://prometheus.io/)+[Grafana](https://grafana.com/) or [ElasticSearch](https://www.elastic.co/elasticsearch/)+[Kibana](https://www.elastic.co/kibana/).
To activate OpenTelemetry on Garage, you must add to your configuration file the following entries (supposing that your collector is also on localhost):
We provide [some files](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/script/telemetry) to help you quickly bootstrap a testing monitoring stack.
It includes a docker-compose file and a pre-configured Grafana dashboard.
You can use them if you want to reproduce the following examples.
Grafana is particularly adapted to understand how your cluster is performing from a "bird's eye view".
More precisely, it maps well to multipart upload requests, and this is expected.
Large files (of the multipart uploads) will saturate the writes of your disk but the uploading of small files (via the PutObject endpoint) will be limited by other parts of the system.
This simple example covers only 2 metrics over the 20+ ones that we already defined, but it still allowed us to precisely describe our cluster usage and identify where bottlenecks could be.
While metrics are good for having a large, general overview of your system, they are however not adapted for digging and pinpointing a specific performance issue on a specific code path.
Below this first histogram, you can select the request you want to inspect, and then see its trace on the bottom part.
The trace shown above can be broken down in 4 parts: fetching the API key to check authentication (`key get`), fetching the bucket identifier from its name (`bucket_alias get`), fetching the bucket configuration to check authorizations (`bucket_v2 get`), and finally inserting the object in the storage (`object insert`).
With this example, we demonstrated that we can inspect Garage internals to find slow requests, then see which codepath has been taken by a request, and finally identify which part of the code took time.
Keep in mind that this is our first iteration on telemetry for Garage, so things are a bit rough around the edges (step-by-step documentation is missing, our Grafana dashboard is a work in progress, etc.).