11 KiB
+++ title="Results of the community survey" date=2024-03-12 +++
We ran a community survey to gather feedback from Garage users and potential users during a two-month period. One of the main objectives of this survey was to determine expectations from the community for Garage's upcoming v1.0 release and for future work. Read this article for a discussion of the results.
The survey collected 127 response during a time period of almost 2 months, from the 15th of January to the 12th of March. The first question we asked users were how they have heard of Garage: the majority answered that they have head of Garage through a link aggregator or social network such as Reddit or HN. A portion of users have heard of it from word of mouth, and a significant portion also answered "Other". Unfortunately we didn't ask respondents for details if they selected "Other", so I'm quite curious as to what this could be. Other choices have almost negligible number of responses.
Half of the respondents indicated that they are currently running a Garage cluster for production data, of which a small fraction indicated running it in a commercial setting. Another third of respondents indicated that they are currently testing Garage or have tested it previously.
About currently running Garage installations
We first asked users what kind of data they were storing in Garage. The first answer, selected by about half of the participants, is for storing back-ups, followed closely by personal files. Other answers follow with a rougly linearly decreasing pattern.
The majority of users are not running Garage in geodistributed mode, but many users are also running in 2, 3 or even 4 locations.
A large majority of users are only using Garage through the S3 API. The remaining users are mostly using a mix of S3 API and web API, with a small number of users (5) using Garage primarily as a web server.
Regarding the size of clusters, the majority of installed clusters are less than 1TB in size. The others are almost all between 1TB to 10TB. 8 users indicated that they are running clusters of more than 10TB. Two users that reported running clusters of more than 100TB, but they also indicated that they are not currently using Garage, so I think that's the size of the data they would like/need to store on Garage, but not the actual size of an installed cluster. The number of objects stored in clusters is quite evenly split between less than 10k, 10k to 100k, and more than 100k.
For about half of respondents, this means storing mostly objects of around 100MB in size. For the others, it's mostly objects of around 10MB. This is very inexact since the proposed answers for cluster size and object count had such large ranges.
Satisfaction regarding Garage
A majority of users reported a high degree of satisfaction with Garage. About a quarter said that Garage has some significant flaws. A small portion of respondents indicated that they cannot use Garage due to missing important features or critical bugs, but still took the time to answer the survey (thanks to them!).
The top 3 strong points of Garage reported by its users are: good S3 compatibility (first place, with 2/3 of respondents agreeing), good performance on small / low-power machines, and easy setup. I'd say we are pretty much on target, as these are some of the main objectives of Garage.
As for most wanted features in Garage, there is a clear winner with a web interface for cluster administration, with over 40% of users mentioning it. The second most wanted feature is support for S3 versioning, with almost 30% of answers.
The vast majority of users reported never losing data that they stored in Garage. Only one indicated that they lost data and it was Garage's fault: this was because they tried to move an LMDB database between machines with different architectures, but the LMDB on-disk format is architecture specific. We should probably be more clear about this in the documentation.
Users in a "homelab/self-hosted setting"
52 respondents indicated that they are using Garage for storing production data in a homelab or self-hosted setting. I'd say this is the most representative portion of Garage users, as it is its primary target. Let's look at the answers from these users only.
About the clusters
Personal files now takes the first place of the kinds of data stored on these clusters, still closely followed by back-ups.
These users are mostly not using Garage in a geodistributed setting. The distribution of answers is very similar to the overall.
Most clusters of these users are less than 1TB and size, and the remaining are mostly in the 1TB - 10TB range. There are fewer clusters than average storing more than 100k objects in this population, but the distribution of object sizes (not shown) is very similar to the overall.
Satisfaction regarding Garage
Homelab/self-hosting users reported a level of satisfaction a bit higher with Garage, with almost 3/4 very satisfied.
The top 3 reasons for using Garage are the same, but good performance on small / low-power machines is now taking the first place.
The top 2 wanted features are still the same, now with an equal number of votes.
Users in a "commercial setting"
Fewer users indicated that they are running Garage in a commercial setting, as this concerned only 12 of the respondents to the survey.
About the clusters
Half of users reported using Garage to store back-ups, and almost half reported storing observability data and web app / service data. One third selected static websites.
Users in a commercial setting are more consistent in their use of the geo-distribution features offered by Garage. Only one third of users are not running in geo-distributed mode. Another third is running Garage in 2 locations, and the last third is running in 3 or more locations, thus benefitting from the best resiliency properties that Garage can offer.
The majority of commercial deployments are storing between 1TB and 10TB of data. About a quarter are storing more than 1 million objects.
It seems that the average object size is much smaller in this population: the majority of answers correspond to average object sizes of less than 10MB, and one foruth of answers corresponds to objects of around 1MB.
Satisfaction regarding Garage
Three quarter of these users reported a high degree of satisfaction with Garage, about the same as for homelab users.
The most liked qualities of Garage are a bit different. Fewer users reported satisfaction due to the easy setup of Garage, but more users indicated that the possibility of easily adding and removing nodes was important to them. Good tolerance to offline nodes and crashes, and good performance in the face of latency, which are the core properties that make Garage work well in geo-distributed settings, were selected by two thirds of users, most likely the same that said they are running in geo-distributed mode.
A web interface for cluster administration is still the most wanted feature, with 40% of votes. Then, one third voted for better monitoring and observability, and for per-bucket levels of consistency and numbers of replicas. Only 25% voted for S3 versioning.
Users that have the biggest clusters
7 users reported running clusters storing more than 10TB of data. About half of these users are using Garage for a homelab or self-hosted setup, and one is in a commercial setting.
About the clusters
Almost all of these users are using Garage to store back-ups. Multimedia files are the second most selected option, which would explain why these clusters are so big.
These deployments are quite evenly split between not being geo-replicated and being geo-replicated in 2 or 3 locations.
Satisfaction regarding garage
A majority of users report a high degree of satisfaction with Garage, but many users also reported significant flaws.
Unsurprisingly, when clusters start becoming big enough, the most requested improvement is better performance around the board. Per-bucket levels of consistency and number of replicas was also selected by almost half of users.
Users that reported that garage had some significant flaws
Focusing on users that reported that Garage is usable for them but has "significant flaws", the two most requested features were a web administration interface and S3 versioning. Bucket-level ACLs (that would allow anonymous access directly from the S3 endpoint) and performance improvements came next.
Concerning users that said that Garage has critical issues that is preventing them from using it, the "Other" option was the most selected answer for the requested features. Licensing issues allegedly preventing commercial use were cited by a few users (hint: it's actually a non-issue, and we will write about this at some point), but I think for most of these users, they have a specific use case in mind which is not targeted by Garage. For instance, several have indicated that they would need POSIX filesystem compatibility and/or the possibility to use Garage as a CSI driver in Kubernetes (unfortunately, this is mostly impossible to achieve with good performance in a geo-distributed environment, and the principles on which Garage is based explicitly prevents it from fulfilling this role).