*We just published Garage v0.7, our second public beta release. In this post, we do a quick tour of its 2 new features: Kubernetes integration and OpenTelemetry support.*
Two months ago, we were impressed by the success of our open beta launch at FOSDEM and on Hacker News: [our intial post](https://garagehq.deuxfleurs.fr/blog/2022-introducing-garage/) lead to more than 40k views in 10 days, going up to 100 views/minute.
Since this event, we continued to improve Garage, and - 2 months after the initial release - we are happy to announce version 0.7.0.
But first, we would like to thank the contributors that made this new release possible: Alex, Jill, Max Audron, Maximilien, Quentin, Rune Henrisken, Steam, and trinity-1686a.
This is also our first time welcoming contributors external to the core team, and as we wish for Garage to be a community-driven project, we encourage it!
We ship [statically compiled binaries](https://garagehq.deuxfleurs.fr/download/) for most Linux architectures (amd64, i386, aarch64 and armv6) and associated [Docker containers](https://hub.docker.com/u/dxflrs).
Garage now is also packaged by third parties on some OS/distributions. We are currently aware of [FreeBSD](https://cgit.freebsd.org/ports/tree/www/garage/Makefile) and [AUR for Arch Linux](https://aur.archlinux.org/packages/garage).
Feel free to [reach out to us](mailto:garagehq@deuxfleurs.fr) if you are packaging (or planning to package) Garage, we welcome maintainers and will upstream specific patches if that can help. If you already did package Garage, tell us and we'll add it to the documentation.
Before Garage v0.7.0, you had to deploy a Consul cluster or spawn a "coordinating" pod to deploy Garage on Kubernetes.
In this new version, Garage integrates a method to discover other peers by using Kubernetes [Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) to simplify cluster discovery.
Garage can self-apply the [Custom Resource Definition](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) (CRD) to your cluster, or you can manage it manually.
Start by creating a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) containing Garage's configuration (let's name it `config.yaml`):
In this example, we keep it to `false`, which means we allow Garage to automatically create the CRD.
Apply this configuration on your cluster:
```bash
kubectl apply -f config.yaml
```
Allowing Garage to create the CRD is not enough, the process must have enough permissions.
A quick unsecure way to add the permission is to create a [ClusterRoleBinding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-and-clusterrolebinding) to give admin rights to our local user, effectively breaking Kubernetes' security model (we name this file `admin.yml`):
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: garage-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:serviceaccount:default:default
```
Apply it:
```bash
kubectl apply -f admin.yaml
```
Finally, we create a [StatefulSet](https://kubernetes.io/fr/docs/concepts/workloads/controllers/statefulset/) to run our service (`service.yaml`):
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: garage
spec:
selector:
matchLabels:
app: garage
serviceName: "garage"
replicas: 3
template:
metadata:
labels:
app: garage
spec:
terminationGracePeriodSeconds: 10
containers:
- name: garage
image: dxflrs/amd64_garage:v0.7.0
ports:
- containerPort: 3900
name: s3-api
- containerPort: 3902
name: web-api
volumeMounts:
- name: fast
mountPath: /mnt/fast
- name: slow
mountPath: /mnt/slow
- name: etc
mountPath: /etc/garage.toml
subPath: garage.toml
volumes:
- name: etc
configMap:
name: garage-config
volumeClaimTemplates:
- metadata:
name: fast
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 100Mi
- metadata:
name: slow
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 100Mi
```
Garage is a stateful program, so it needs a stable place to store its data and metadata.
This feature is provided by Kubernetes' [Persistent Volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) that can be used only from a [StatefulSet](https://kubernetes.io/fr/docs/concepts/workloads/controllers/statefulset/), hence the choice of this K8S object to deploy our service.
Kubernetes has many "drivers" for Persistent Volumes, for production uses we recommend **only** the `local` driver.
Using other drivers may lead to huge performance issues or data corruption, probably both in practice.
In the example, we are claiming 2 volumes of 100MB.
We use 2 volumes instead of 1 because Garage separates its metadata from its data.
By having 2 volumes, you can reserve a smaller capacity on a SSD for the metadata and a larger capacity on a regular HDD for the data.
Do not forget to change the reserved capacity, 100MB is only suitable for testing.
*Note how we are mounting our ConfigMap: we need to set the `subpath` property to mount only the `garage.toml` file and not the whole `/etc` folder that would prevent K8S from writing its own files
in `/etc` and fail the pod.*
You can apply this file with:
```bash
kubectl apply -f service.yaml
```
Now, you are ready to interact with your cluster, each instance must have discovered the other ones:
```bash
kubectl exec -it garage-0 --container garage -- /garage status
# ==== HEALTHY NODES ====
# ID Hostname Address Tags Zone Capacity
# e6284331c321a23c garage-0 172.17.0.5:3901 NO ROLE ASSIGNED
# 570ff9b0ed3648a7 garage-2 [::ffff:172.17.0.7]:3901 NO ROLE ASSIGNED
# e1990a2069429428 garage-1 [::ffff:172.17.0.6]:3901 NO ROLE ASSIGNED
```
Of course, to have a full deployment, you will probably want to deploy a [Service](https://kubernetes.io/docs/concepts/services-networking/service/) in front of your cluster and/or a reverse proxy.
If Kubernetes is not your thing, know that we are running Garage on a Nomad+Consul cluster.
We have not documented it yet but you can get a look at [our Nomad service](https://git.deuxfleurs.fr/Deuxfleurs/infrastructure/src/commit/1e5e4af35c073d04698bb10dd4ad1330d6c62a0d/app/garage/deploy/garage.hcl).
[OpenTelemetry](https://opentelemetry.io/) standardizes how software generate and collect system telemetry, namely metrics, logs and traces.
By implementing this standard in Garage, we hope that it will help you to better monitor, manage and tune your cluster.
Note that to fully leverage this feature, you must be already familiar with monitoring stacks like [Prometheus](https://prometheus.io/)+[Grafana](https://grafana.com/) or [ElasticSearch](https://www.elastic.co/elasticsearch/)+[Kibana](https://www.elastic.co/kibana/).
To activate OpenTelemetry on Garage, you must add to your configuration file the following entries (supposing that your collector is also on localhost):
```toml
[admin]
api_bind_addr = "127.0.0.1:3903"
trace_sink = "http://localhost:4317"
```
We provide [some files](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/script/telemetry) to help you quickly bootstrap a testing monitoring stack.
It includes a docker-compose file and a pre-configured Grafana dashboard.
You can use them if you want to reproduce the following examples.
Now that your telemetry data is collected and stored, you can visualize it.
Grafana is particularly adapted to understand how your cluster is performing from a "bird's eye view".
For example, the following graph shows S3 API calls sent to your node per time-unit,
you can use it to better understand how your users are interacting with your cluster.
![A screenshot of a plot made by Grafana depicting the number of requests per time units grouped by endpoints](/images/blog/api_rate.png)
Thanks to this graph, we know that starting at 14:55, an important upload has been started.
This upload is made of many small files, as we see many PutObject calls that are often used for small files.
It also has some large objects, as we observe some Multipart Uploads requests.
Conversely, at this time, no read are done as the corresponding read enpoints (ListBuckets, ListObjectsv2, etc.) receive 0 request per time unit.
Garage also collects metrics from lower level parts of the system.
You can use them to better understand how Garage is interacting with your OS and your hardware.
![A screenshot of a plot made by Grafana depicting the write speed (in MB/s) during time.](/images/blog/writes.png)
This plot has been captured at the same moment than the previous one.
We do not see a correlation between the writes and the API requests for the full upload but only for its beginning.
More precisely, it maps well to Multipart Uploads requests, and this is expected.
Large files (of the Multipart Uploads) will saturate the writes of your disk but the uploading of small files (via the PutObject endpoint) will be throttled by other parts of the system.
This simple example covers only 2 metrics over the 20+ ones that we already defined, but we were still able to precisely describe our cluster usage and identifies where bottlenecks could be.
While metrics are good to have a large, general overview of your system, they are however not adapted to dig and pinpoint a specific performance issue on a specific code path.
Thankfully, we also have a solution for this problem: tracing.
Below this first histogram, you can select the request you want to inspect, and then see its stacktrace on the bottom part.
You can break down this trace in 4 parts: fetching the API key to check authentication (`key get`), fetching the bucket identifier from its name (`bucket_alias get`), fetching the bucket configuration to check authorizations (`bucket_v2 get`), and finally inserting the object in the storage (`object insert`).
With this example, we demonstrated that we can inspect Garage internals to find slow requests, then see which codepath has been taken by a request, to finally identify which part of the code took time.
Keep in mind that this is our first iteration on telemetry for Garage, so things are a bit rough around the edges (step by step documentation is missing, our Grafana dashboard is a work in a progress, etc.).
In all cases, your feedback is welcome on our Matrix channel.
Currently, our goal is to reach v1.0, for which we want to work on these three desirable properties: *Feature completeness*, *Understandability and manageability*, and *Correctness*.
**Feature completeness**. We have already implemented a selected subset of S3 endpoints that works quite well, but we want to work on the corner cases that are not yet solved (eg. [#263](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues), [#248](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/248), [#204](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/204). Based on community feedbacks, we might consider implementing additional endpoints (eg. [#166](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/166) but we can't make any promise (sorry!). Finally, we made a serie of observation: 1) the S3 API has a limited semantic, for example it is not adapted for append-only log data structures, 2) many projects require a database additionaly to the object store, and 3) we already implemented a key value store internally to handle S3 metadata. Following these observations, we want to study the feasibility of providing a simple and totally optional key value interface that we refer to as K2V. We are currently writing [an API draft](https://p.adnab.me/code/#/2/code/view/eUNPbfoUrMbCY+CoMXaqed4jmWlmvWALHNDcfuM-O5o/embed/present/) and will try to implement it in the following months. We would like it to be as close as possible as the original [Amazon Dynamo paper](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf), or if you like approximative comparisons, K2V could be to Cassandra what sqlite is to PostgreSQL.
**Understandability and manageability**. We want a system that is understood and manageable by the largest possible amount of operators. We identified the following points that we would like to improve: 1) Explaining Garage's consistency model, both on the S3 and the admin API, 2) Explaining how Garage can take its place in the existing ecosystem, including among the other distributed storage systems (eg. Ceph, Minio, SeaweedFS, IPFS Cluster), but also in term of uses cases and deployments (how does it perform at scale, with which hardware, for which application, etc.) 3) Make possible to manage Garage from a REST API, possibly write a web GUI to make administration easier, 4) help people understand the reliability and storage density they will have for a specific Garage deployment, if possible through a simulator, 5) we might consider adding a system of quota to protect a cluster from a misbehaving user.
**Correctness**. We know in theory that Garage's design works and scales.
But we still need to make sure that in practise our implementation is correct, and thus features these defined properties.
To convince ourselves, we consider verifying our consistency model implementation through [Jepsen](https://jepsen.io/).
We also plan to deploy Garage on multiple clusters and do a large serie of benchmarks.
Please note that this roadmap is purely indicative, we are not committing to deliver these features.
We also don't know when v1.0 will be released, except "when it will be ready", but we would be happy if it could be by the end of 2022.
Finally, if you have some knowledge on one or more of thes points and would like to help, feel free to ping us on Matrix.