2022-02-01 17:23:36 +00:00
|
|
|
|
+++
|
2022-02-07 15:14:19 +00:00
|
|
|
|
title="Introducing Garage, our self-hosted distributed object storage solution"
|
2022-02-01 17:23:36 +00:00
|
|
|
|
date=2022-02-01
|
|
|
|
|
+++
|
|
|
|
|
|
|
|
|
|
*Deuxfleurs is a non-profit based in France that aims to defend and promote
|
|
|
|
|
individual freedom and rights on the Internet. In their quest to build a
|
|
|
|
|
decentralized, resilient self-hosting infrastructure, they have found that
|
2022-07-07 17:20:26 +00:00
|
|
|
|
currently, existing software is often ill-suited to such a particular deployment
|
2022-02-01 17:23:36 +00:00
|
|
|
|
scenario. In the context of data storage, Garage was built to provide a highly
|
|
|
|
|
available data store that exploits redundancy over different geographical
|
|
|
|
|
locations, and does its best to not be too impacted by network latencies.*
|
|
|
|
|
|
|
|
|
|
<!-- more -->
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
Hello! We are Deuxfleurs, a non-profit based in France working to promote
|
|
|
|
|
self-hosting and small-scale hosting.
|
|
|
|
|
|
2022-07-08 11:55:16 +00:00
|
|
|
|
What does that mean? Well, we figured that big tech monopolies such as Google,
|
2022-02-01 17:23:36 +00:00
|
|
|
|
Facebook or Amazon today hold disproportionate power and are becoming quite
|
|
|
|
|
dangerous to us, citizens of the Internet. They know everything we are doing,
|
|
|
|
|
saying, and even thinking, and they are not making good use of that
|
|
|
|
|
information. The interests of these companies are those of the capitalist
|
2022-07-07 17:20:26 +00:00
|
|
|
|
elite: they are most interested in making huge profits by exploiting the
|
|
|
|
|
Earth's precious resources, producing, advertising, and selling us massive
|
2022-02-01 17:23:36 +00:00
|
|
|
|
amounts of stuff we don't need. They don't truly care about the needs of the
|
2022-07-08 11:55:16 +00:00
|
|
|
|
people, nor do they care that planetary destruction is under way because of
|
2022-02-01 17:23:36 +00:00
|
|
|
|
them.
|
|
|
|
|
|
|
|
|
|
Big tech monopolies are in a particularly strong position to influence our
|
|
|
|
|
behaviors, consciously or not, because we rely on them for selecting the online
|
|
|
|
|
content we read, watch, or listen to. Advertising is omnipresent, and because
|
|
|
|
|
they know us so well, they can subvert us into thinking that a mindless
|
|
|
|
|
consumer society is what we truly want, whereas we most likely would choose
|
|
|
|
|
otherwise if we had the chance to think by ourselves.
|
|
|
|
|
|
|
|
|
|
We don't want that. That's not what the Internet is for. Freedom is freedom
|
|
|
|
|
from influence: the ability to do things by oneself, for oneself, on one's own
|
|
|
|
|
terms. Self-hosting is both the means by which we reclaim this freedom on the
|
|
|
|
|
Internet – by not using services of big tech monopolies and thus removing
|
|
|
|
|
ourselves from their influence – and the result of applying our critical
|
|
|
|
|
thinking and our technical abilities to build the Internet that suits us.
|
|
|
|
|
|
|
|
|
|
Self-hosting means that we don't use cloud services. Instead, we store our
|
|
|
|
|
personal data on computers that we own, which we run at home. We build local
|
|
|
|
|
communities to share the services that we run with non-technical people. We
|
|
|
|
|
communicate with other groups that do the same (or, sometimes, that don't)
|
|
|
|
|
thanks to standard protocols such as HTTP, e-mail, or Matrix, that allow a
|
|
|
|
|
global community to exist outside of big tech monopolies.
|
|
|
|
|
|
|
|
|
|
### Self-hosting is a hard problem
|
|
|
|
|
|
|
|
|
|
As I said, self-hosting means running our own hardware at home, and providing
|
|
|
|
|
24/7 Internet services from there. We have many reasons for doing this. One is
|
|
|
|
|
because this is the only way we can truly control who has access to our data.
|
|
|
|
|
Another one is that it helps us be aware of the physical substrate of which the
|
2022-07-07 17:20:26 +00:00
|
|
|
|
Internet is made: making the Internet run has an environmental cost that we
|
2022-02-01 17:23:36 +00:00
|
|
|
|
want to evaluate and keep under control. The physical hardware also gives us a
|
|
|
|
|
sense of community, calling to mind all of the people that could currently be
|
|
|
|
|
connected and making use of our services, and reminding us of the purpose for
|
|
|
|
|
which we are doing this.
|
|
|
|
|
|
|
|
|
|
If you have a home, you know that bad things can happen there too. The power
|
2022-07-07 17:20:26 +00:00
|
|
|
|
grid is not infallible, and neither is your Internet connection. Fires and floods
|
2022-02-01 17:23:36 +00:00
|
|
|
|
happen. And the computers we are running can themselves crash at any moment,
|
|
|
|
|
for any number of reasons. Self-hosted solutions today are often not equipped
|
2022-07-07 17:20:26 +00:00
|
|
|
|
to face such challenges and might suffer from unavailability or data loss
|
2022-02-01 17:23:36 +00:00
|
|
|
|
as a consequence.
|
|
|
|
|
|
|
|
|
|
If we want to grow our communities, and attract more people that might be
|
|
|
|
|
sympathetic to our vision of the world, we need a baseline of quality for the
|
|
|
|
|
services we provide. Users can tolerate some flaws or imperfections, in the
|
|
|
|
|
name of defending and promoting their ideals, but if the services are
|
|
|
|
|
catastrophic, being unavailable at critical times, or losing users' precious
|
|
|
|
|
data, the compromise is much harder to make and people will be tempted to go
|
|
|
|
|
back to a comfortable lifestyle bestowed by big tech companies.
|
|
|
|
|
|
|
|
|
|
Fixing availability, making services reliable even when hosted at unreliable
|
2022-07-07 17:20:26 +00:00
|
|
|
|
locations or on unreliable hardware is one of the main objectives of
|
2022-02-01 17:23:36 +00:00
|
|
|
|
Deuxfleurs, and in particular of the project Garage which we are building.
|
|
|
|
|
|
|
|
|
|
### Distributed systems to the rescue
|
|
|
|
|
|
|
|
|
|
Distributed systems, or distributed computing, is a set of techniques that can
|
|
|
|
|
be applied to make computer services more reliable, by making them run on
|
|
|
|
|
several computers at once. It so happens that a few of us have studied
|
|
|
|
|
distributed systems, which helps a lot (some of us even have PhDs!)
|
|
|
|
|
|
|
|
|
|
The following concepts of distributed computing are particularly relevant to
|
|
|
|
|
us:
|
|
|
|
|
|
|
|
|
|
- **Crash tolerance** is when a service that runs on several computers at once
|
|
|
|
|
can continue operating normally even when one (or a small number) of the
|
2022-07-08 11:55:16 +00:00
|
|
|
|
computers stops working.
|
2022-02-01 17:23:36 +00:00
|
|
|
|
|
|
|
|
|
- **Geo-distribution** is when the computers that make up a distributed system
|
|
|
|
|
are not all located in the same facility. Ideally, they would even be spread
|
|
|
|
|
over different cities, so that outages affecting one region do not prevent
|
|
|
|
|
the rest of the system from working.
|
|
|
|
|
|
|
|
|
|
We set out to apply these concepts at Deuxfleurs to build our infrastructure,
|
|
|
|
|
in order to provide services that are replicated over several machines in several
|
|
|
|
|
geographical locations, so that we are able to provide good availability guarantees
|
|
|
|
|
to our users. We try to use as most as possible software packages that already
|
|
|
|
|
existed and are freely available, for example the Linux operating system
|
|
|
|
|
and the HashiCorp suite (Nomad and Consul).
|
|
|
|
|
|
|
|
|
|
Unfortunately, in the domain of distributed data storage, the available options
|
|
|
|
|
weren't entirely satisfactory in our case, which is why we launched the
|
|
|
|
|
development of our own solution: Garage. We will talk more in other blog
|
|
|
|
|
posts about why Garage is better suited to us than alternative options. In this
|
|
|
|
|
post, I will simply try to give a high-level overview of what Garage is.
|
|
|
|
|
|
|
|
|
|
### What is Garage, exactly?
|
|
|
|
|
|
|
|
|
|
Garage is a distributed storage solution, that automatically replicates your
|
|
|
|
|
data on several servers. Garage takes into account the geographical location
|
|
|
|
|
of servers, and ensures that copies of your data are located at different
|
|
|
|
|
locations when possible for maximal redundancy, a unique feature in the
|
|
|
|
|
landscape of distributed storage systems.
|
|
|
|
|
|
|
|
|
|
Garage implements the Amazon S3 protocol, a de-facto standard that makes it
|
|
|
|
|
compatible with a large variety of existing software. For instance it can be
|
2022-07-07 17:20:26 +00:00
|
|
|
|
used as a storage backend for many self-hosted web applications such as
|
2022-02-01 17:23:36 +00:00
|
|
|
|
NextCloud, Matrix, Mastodon, Peertube, and many others, replacing the local
|
2022-07-07 17:20:26 +00:00
|
|
|
|
file system of a server with a distributed storage layer. Garage can also be
|
2022-02-01 17:23:36 +00:00
|
|
|
|
used to synchronize your files or store your backups with utilities such as
|
|
|
|
|
Rclone or Restic. Last but not least, Garage can be used to host static
|
|
|
|
|
websites, such as the one you are currently reading, which is served directly
|
|
|
|
|
by the Garage cluster we host at Deuxfleurs.
|
|
|
|
|
|
|
|
|
|
Garage leverages the theory of distributed systems, and in particular
|
|
|
|
|
*Conflict-free Replicated Data Types* (CRDTs in short), a set of mathematical
|
|
|
|
|
tools that help us write distributed software that runs faster, by avoiding
|
|
|
|
|
some kinds of unnecessary chit-chat between servers. In a future blog post,
|
2022-07-07 17:20:26 +00:00
|
|
|
|
we will show how this allows us to significantly outperform Minio, our closest
|
2022-02-01 17:23:36 +00:00
|
|
|
|
competitor (another self-hostable implementation of the S3 protocol).
|
|
|
|
|
|
|
|
|
|
On the side of software engineering, we are committed to making Garage
|
|
|
|
|
a tool that is reliable, lightweight, and easy to administrate.
|
|
|
|
|
Garage is written in the Rust programming language, which helps us ensure
|
|
|
|
|
the stability and safety of the software, and allows us to build software
|
|
|
|
|
that is fast and uses little memory.
|
|
|
|
|
|
|
|
|
|
### Conclusion
|
|
|
|
|
|
|
|
|
|
The current version of Garage is version 0.6, which is a *beta* release.
|
|
|
|
|
This means that it hasn't yet been tested by many people, and we might have
|
|
|
|
|
ignored some edge cases in which it would not perform as expected.
|
|
|
|
|
|
|
|
|
|
However, we are already actively using Garage at Deuxfleurs for many uses, and
|
|
|
|
|
it is working exceptionally well for us. We are currently using it to store
|
|
|
|
|
backups of personal files, to store the media files that we send and receive
|
|
|
|
|
over the Matrix network, as well as to host a small but increasing number of
|
|
|
|
|
static websites. Our current deployment hosts about 200 000 files spread in 50
|
2022-07-07 17:20:26 +00:00
|
|
|
|
buckets, for a total size of slightly above 500 GB. These numbers can seem small
|
2022-02-01 17:23:36 +00:00
|
|
|
|
when compared to the datasets you could expect your typical cloud provider to
|
|
|
|
|
be handling, however these sizes are fairly typical of the small-scale
|
|
|
|
|
self-hosted deployments we are targeting, and our Garage cluster is in no way
|
|
|
|
|
nearing its capacity limit.
|
|
|
|
|
|
|
|
|
|
Today, we are proudly releasing Garage's new website, with updated
|
|
|
|
|
documentation pages. Poke around to try to understand how the software works,
|
|
|
|
|
and try installing your own instance! Your feedback is precious to us, and we
|
2022-02-02 10:00:04 +00:00
|
|
|
|
would be glad to hear back from you on our
|
|
|
|
|
[issue tracker](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues), by
|
|
|
|
|
[e-mail](mailto:garagehq@deuxfleurs.fr), or on our
|
|
|
|
|
[Matrix channel](https://matrix.to/#/%23garage:deuxfleurs.fr) (`#garage:deuxfleurs.fr`).
|