garagehq.deuxfleurs.fr/content/blog/2022-introducing-garage.md
Maximilien 046bef611c
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
Proofreading after-the-fact
2022-07-07 19:20:26 +02:00

170 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

+++
title="Introducing Garage, our self-hosted distributed object storage solution"
date=2022-02-01
+++
*Deuxfleurs is a non-profit based in France that aims to defend and promote
individual freedom and rights on the Internet. In their quest to build a
decentralized, resilient self-hosting infrastructure, they have found that
currently, existing software is often ill-suited to such a particular deployment
scenario. In the context of data storage, Garage was built to provide a highly
available data store that exploits redundancy over different geographical
locations, and does its best to not be too impacted by network latencies.*
<!-- more -->
---
Hello! We are Deuxfleurs, a non-profit based in France working to promote
self-hosting and small-scale hosting.
What does that mean? Well, we figured that big tech monopoly such as Google,
Facebook or Amazon today hold disproportionate power and are becoming quite
dangerous to us, citizens of the Internet. They know everything we are doing,
saying, and even thinking, and they are not making good use of that
information. The interests of these companies are those of the capitalist
elite: they are most interested in making huge profits by exploiting the
Earth's precious resources, producing, advertising, and selling us massive
amounts of stuff we don't need. They don't truly care about the needs of the
people, nor do they care that planetary destruction is underway because of
them.
Big tech monopolies are in a particularly strong position to influence our
behaviors, consciously or not, because we rely on them for selecting the online
content we read, watch, or listen to. Advertising is omnipresent, and because
they know us so well, they can subvert us into thinking that a mindless
consumer society is what we truly want, whereas we most likely would choose
otherwise if we had the chance to think by ourselves.
We don't want that. That's not what the Internet is for. Freedom is freedom
from influence: the ability to do things by oneself, for oneself, on one's own
terms. Self-hosting is both the means by which we reclaim this freedom on the
Internet by not using services of big tech monopolies and thus removing
ourselves from their influence and the result of applying our critical
thinking and our technical abilities to build the Internet that suits us.
Self-hosting means that we don't use cloud services. Instead, we store our
personal data on computers that we own, which we run at home. We build local
communities to share the services that we run with non-technical people. We
communicate with other groups that do the same (or, sometimes, that don't)
thanks to standard protocols such as HTTP, e-mail, or Matrix, that allow a
global community to exist outside of big tech monopolies.
### Self-hosting is a hard problem
As I said, self-hosting means running our own hardware at home, and providing
24/7 Internet services from there. We have many reasons for doing this. One is
because this is the only way we can truly control who has access to our data.
Another one is that it helps us be aware of the physical substrate of which the
Internet is made: making the Internet run has an environmental cost that we
want to evaluate and keep under control. The physical hardware also gives us a
sense of community, calling to mind all of the people that could currently be
connected and making use of our services, and reminding us of the purpose for
which we are doing this.
If you have a home, you know that bad things can happen there too. The power
grid is not infallible, and neither is your Internet connection. Fires and floods
happen. And the computers we are running can themselves crash at any moment,
for any number of reasons. Self-hosted solutions today are often not equipped
to face such challenges and might suffer from unavailability or data loss
as a consequence.
If we want to grow our communities, and attract more people that might be
sympathetic to our vision of the world, we need a baseline of quality for the
services we provide. Users can tolerate some flaws or imperfections, in the
name of defending and promoting their ideals, but if the services are
catastrophic, being unavailable at critical times, or losing users' precious
data, the compromise is much harder to make and people will be tempted to go
back to a comfortable lifestyle bestowed by big tech companies.
Fixing availability, making services reliable even when hosted at unreliable
locations or on unreliable hardware is one of the main objectives of
Deuxfleurs, and in particular of the project Garage which we are building.
### Distributed systems to the rescue
Distributed systems, or distributed computing, is a set of techniques that can
be applied to make computer services more reliable, by making them run on
several computers at once. It so happens that a few of us have studied
distributed systems, which helps a lot (some of us even have PhDs!)
The following concepts of distributed computing are particularly relevant to
us:
- **Crash tolerance** is when a service that runs on several computers at once
can continue operating normally even when one (or a small number) of the
computers stop working.
- **Geo-distribution** is when the computers that make up a distributed system
are not all located in the same facility. Ideally, they would even be spread
over different cities, so that outages affecting one region do not prevent
the rest of the system from working.
We set out to apply these concepts at Deuxfleurs to build our infrastructure,
in order to provide services that are replicated over several machines in several
geographical locations, so that we are able to provide good availability guarantees
to our users. We try to use as most as possible software packages that already
existed and are freely available, for example the Linux operating system
and the HashiCorp suite (Nomad and Consul).
Unfortunately, in the domain of distributed data storage, the available options
weren't entirely satisfactory in our case, which is why we launched the
development of our own solution: Garage. We will talk more in other blog
posts about why Garage is better suited to us than alternative options. In this
post, I will simply try to give a high-level overview of what Garage is.
### What is Garage, exactly?
Garage is a distributed storage solution, that automatically replicates your
data on several servers. Garage takes into account the geographical location
of servers, and ensures that copies of your data are located at different
locations when possible for maximal redundancy, a unique feature in the
landscape of distributed storage systems.
Garage implements the Amazon S3 protocol, a de-facto standard that makes it
compatible with a large variety of existing software. For instance it can be
used as a storage backend for many self-hosted web applications such as
NextCloud, Matrix, Mastodon, Peertube, and many others, replacing the local
file system of a server with a distributed storage layer. Garage can also be
used to synchronize your files or store your backups with utilities such as
Rclone or Restic. Last but not least, Garage can be used to host static
websites, such as the one you are currently reading, which is served directly
by the Garage cluster we host at Deuxfleurs.
Garage leverages the theory of distributed systems, and in particular
*Conflict-free Replicated Data Types* (CRDTs in short), a set of mathematical
tools that help us write distributed software that runs faster, by avoiding
some kinds of unnecessary chit-chat between servers. In a future blog post,
we will show how this allows us to significantly outperform Minio, our closest
competitor (another self-hostable implementation of the S3 protocol).
On the side of software engineering, we are committed to making Garage
a tool that is reliable, lightweight, and easy to administrate.
Garage is written in the Rust programming language, which helps us ensure
the stability and safety of the software, and allows us to build software
that is fast and uses little memory.
### Conclusion
The current version of Garage is version 0.6, which is a *beta* release.
This means that it hasn't yet been tested by many people, and we might have
ignored some edge cases in which it would not perform as expected.
However, we are already actively using Garage at Deuxfleurs for many uses, and
it is working exceptionally well for us. We are currently using it to store
backups of personal files, to store the media files that we send and receive
over the Matrix network, as well as to host a small but increasing number of
static websites. Our current deployment hosts about 200 000 files spread in 50
buckets, for a total size of slightly above 500 GB. These numbers can seem small
when compared to the datasets you could expect your typical cloud provider to
be handling, however these sizes are fairly typical of the small-scale
self-hosted deployments we are targeting, and our Garage cluster is in no way
nearing its capacity limit.
Today, we are proudly releasing Garage's new website, with updated
documentation pages. Poke around to try to understand how the software works,
and try installing your own instance! Your feedback is precious to us, and we
would be glad to hear back from you on our
[issue tracker](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues), by
[e-mail](mailto:garagehq@deuxfleurs.fr), or on our
[Matrix channel](https://matrix.to/#/%23garage:deuxfleurs.fr) (`#garage:deuxfleurs.fr`).