Reorganize and improve documentation
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing

This commit is contained in:
Alex 2021-12-06 16:10:32 +01:00
parent 7c2037ba87
commit 224c89ad6e
No known key found for this signature in database
GPG key ID: EDABF9711E244EB1
11 changed files with 128 additions and 102 deletions

View file

@ -8,7 +8,7 @@
- [Multi-node deployment](./cookbook/real_world.md)
- [Building from source](./cookbook/from_source.md)
- [Integration with systemd](./cookbook/systemd.md)
- [Gateways](./cookbook/gateways.md)
- [Configuring a gateway node](./cookbook/gateways.md)
- [Exposing buckets as websites](./cookbook/exposing_websites.md)
- [Configuring a reverse proxy](./cookbook/reverse_proxy.md)
- [Recovering from failures](./cookbook/recovering.md)
@ -30,9 +30,9 @@
- [S3 compatibility status](./reference_manual/s3_compatibility.md)
- [Design](./design/index.md)
- [Related Work](./design/related_work.md)
- [Goals and use Cases](./design/goals.md)
- [Related work](./design/related_work.md)
- [Internals](./design/internals.md)
- [Design draft](./design/design_draft.md)
- [Development](./development/index.md)
- [Setup your environment](./development/devenv.md)
@ -41,5 +41,6 @@
- [Miscellaneous notes](./development/miscellaneous_notes.md)
- [Working Documents](./working_documents/index.md)
- [Load Balancing Data](./working_documents/load_balancing.md)
- [Load balancing data](./working_documents/load_balancing.md)
- [Migrating from 0.3 to 0.4](./working_documents/migration_04.md)
- [Design draft](./working_documents/design_draft.md)

View file

@ -1,7 +1,19 @@
# Connect it to
# Connect it to...
To configure an S3 client to interact with Garage, you will need the following
parameters:
Garage implements the Amazon S3 protocol, which makes it compatible with many existing software programs.
In particular, you will find here instructions to connect it with:
- [web applications](./apps.md)
- [website hosting](./websites.md)
- [software repositories](./repositories.md)
- [CLI tools](./cli.md)
- [your own code](./code.md)
### Generic instructions
To configure S3-compatible software to interact with Garage,
you will need the following parameters:
- An **API endpoint**: this corresponds to the HTTP or HTTPS address
used to contact the Garage server. When runing Garage locally this will usually
@ -27,12 +39,3 @@ provided that you follow the following guidelines:
If this is not configured explicitly, clients usually try to talk to region `us-east-1`.
Garage should normally redirect your client to the correct region,
but in case your client does not support this you might have to configure it manually.
We will now provide example configurations for the most common clients per category:
- [Apps](./apps.md)
- [Websites](./websites.md)
- [Repositories](./repositories.md)
- [CLI tools](./cli.md)
- [Your code](./code.md)

View file

@ -6,11 +6,11 @@ Gateways allow you to expose Garage endpoints (S3 API and websites) without stor
You can configure Garage as a gateway on all nodes that will consume your S3 API, it will provide you the following benefits:
- **It removes 1 or 2 network RTT** Instead of (querying your reverse proxy then) querying a random node of the cluster that will forward your request to the nodes effectively storing the data, your local gateway will directly knows which node to query.
- **It removes 1 or 2 network RTT.** Instead of (querying your reverse proxy then) querying a random node of the cluster that will forward your request to the nodes effectively storing the data, your local gateway will directly knows which node to query.
- **It ease server management** Instead of tracking in your reverse proxy and DNS what are the current Garage nodes, your gateway being part of the cluster keeps this information for you. In your software, you will always specify `http://localhost:3900`.
- **It eases server management.** Instead of tracking in your reverse proxy and DNS what are the current Garage nodes, your gateway being part of the cluster keeps this information for you. In your software, you will always specify `http://localhost:3900`.
- **It simplifies security** Instead of having to maintain and renew a TLS certificate, you leverage the Secret Handshake protocol we use for our cluster. The S3 API protocol will be in plain text but limited to your local machine.
- **It simplifies security.** Instead of having to maintain and renew a TLS certificate, you leverage the Secret Handshake protocol we use for our cluster. The S3 API protocol will be in plain text but limited to your local machine.
## Limitations

View file

@ -4,22 +4,23 @@ A cookbook, when you cook, is a collection of recipes.
Similarly, Garage's cookbook contains a collection of recipes that are known to works well!
This chapter could also be referred as "Tutorials" or "Best practices".
- **[Deploying Garage](real_world.md):** This page will walk you through all of the necessary
- **[Multi-node deployment](real_world.md):** This page will walk you through all of the necessary
steps to deploy Garage in a real-world setting.
- **[Configuring S3 clients](clients.md):** This page will explain how to configure
popular S3 clients to interact with a Garage server.
- **[Hosting a website](website.md):** This page explains how to use Garage
to host a static website.
- **[Recovering from failures](recovering.md):** Garage's first selling point is resilience
to hardware failures. This section explains how to recover from such a failure in the
best possible way.
- **[Building from source](from_source.md):** This page explains how to build Garage from
source in case a binary is not provided for your architecture, or if you want to
hack with us!
- **[Starting with Systemd](from_source.md):** This page explains how to run Garage
- **[Integration with Systemd](systemd.md):** This page explains how to run Garage
as a Systemd service (instead of as a Docker container).
- **[Configuring a gateway node](gateways.md):** This page explains how to run a gateway node in a Garage cluster, i.e. a Garage node that doesn't store data but accelerates access to data present on the other nodes.
- **[Hosting a website](exposing_websites.md):** This page explains how to use Garage
to host a static website.
- **[Configuring a reverse-proxy](reverse_proxy.md):** This page explains how to configure a reverse-proxy to add TLS support to your S3 api endpoint.
- **[Recovering from failures](recovering.md):** Garage's first selling point is resilience
to hardware failures. This section explains how to recover from such a failure in the
best possible way.

View file

@ -286,5 +286,5 @@ and is covered in the [quick start guide](../quick_start/index.md).
Remember also that the CLI is self-documented thanks to the `--help` flag and
the `help` subcommand (e.g. `garage help`, `garage key --help`).
Configuring an S3 client to interact with Garage is covered
[in the next section](clients.md).
Configuring S3-compatible applicatiosn to interact with Garage
is covered in the [Integrations](/connect/index.html) section.

View file

@ -0,0 +1,53 @@
# Goals and use cases
## Goals and non-goals
Garage is a lightweight geo-distributed data store that implements the
[Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)
object storage protocole. It enables applications to store large blobs such
as pictures, video, images, documents, etc., in a redundant multi-node
setting. S3 is versatile enough to also be used to publish a static
website.
Garage is an opinionated object storage solutoin, we focus on the following **desirable properties**:
- **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target [hyperconverged infrastructures](https://en.wikipedia.org/wiki/Hyper-converged_infrastructure).
- **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures.
- **Simple**: simple to understand, simple to operate, simple to debug.
- **Internet enabled**: made for multi-sites (eg. datacenters, offices, households, etc.) interconnected through regular Internet connections.
We also noted that the pursuit of some other goals are detrimental to our initial goals.
The following has been identified as **non-goals** (if these points matter to you, you should not use Garage):
- **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only.
- **Feature extensiveness**: we do not plan to add additional features compared to the ones provided by the S3 API.
- **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication.
- **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment.
## Use-cases
*Are you also using Garage in your organization? [Open a PR](https://git.deuxfleurs.fr/Deuxfleurs/garage) to add your use case here!*
### Deuxfleurs
[Deuxfleurs](https://deuxfleurs.fr) is an experimental non-profit hosting
organization that develops Garage. Deuxfleurs is focused on building highly
available infrastructure through redundancy in multiple geographical
locations. They use Garage themselves for the following tasks:
- Hosting of [main website](https://deuxfleurs.fr), [this website](https://garagehq.deuxfleurs.fr), as well as the personal website of many of the members of the organization
- As a [Matrix media backend](https://github.com/matrix-org/synapse-s3-storage-provider)
- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 proxy
- In the Drone continuous integration platform to store task logs
- As a Nix binary cache
- As a backup target using `rclone`
The Deuxfleurs Garage cluster is a multi-site cluster currently composed of
4 nodes in 2 physical locations. In the future it will be expanded to at
least 3 physical locations to fully exploit Garage's potential for high
availability.

View file

@ -1,30 +1,22 @@
# Design
The design section helps you to see Garage from a "big picture" perspective.
It will allow you to understand if Garage is a good fit for you,
how to better use it, how to contribute to it, what can Garage could and could not do, etc.
The design section helps you to see Garage from a "big picture"
perspective. It will allow you to understand if Garage is a good fit for
you, how to better use it, how to contribute to it, what can Garage could
and could not do, etc.
## Goals and non-goals
- **[Goals and use cases](goals.md):** This page explains why Garage was concieved and what practical use cases it targets.
Garage is an opinionated object storage solutoin, we focus on the following **desirable properties**:
- **[Related work](related_work.md):** This pages presents the theoretical background on which Garage is built, and describes other software storage solutions and why they didn't work for us.
- **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target [hyperconverged infrastructures](https://en.wikipedia.org/wiki/Hyper-converged_infrastructure).
- **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures.
- **Simple**: simple to understand, simple to operate, simple to debug.
- **Internet enabled**: made for multi-sites (eg. datacenters, offices, households, etc.) interconnected through regular Internet connections.
We also noted that the pursuit of some other goals are detrimental to our initial goals.
The following has been identified as **non-goals** (if these points matter to you, you should not use Garage):
- **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only.
- **Feature extensiveness**: we do not plan to add additional features compared to the ones provided by the S3 API.
- **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication.
- **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment.
- **[Internals](internals.md):** This page enters into more details on how Garage manages data internally.
## Talks
We love to talk and hear about Garage, that's why we keep a log here:
- [(fr, 2021-11-13, video) Garage : Mille et une façons de stocker vos données](https://video.tedomum.net/w/moYKcv198dyMrT8hCS5jz9) and [slides (html)](https://rfid.deuxfleurs.fr/presentations/2021-11-13/garage/) - during [RFID#1](https://rfid.deuxfleurs.fr/programme/2021-11-13/) event
- [(en, 2021-04-28) Distributed object storage is centralised](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2021-04-28_spirals-team/talk.pdf)
- [(fr, 2020-12-02) Garage : jouer dans la cour des grands quand on est un hébergeur associatif](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2020-12-02_wide-team/talk.pdf)

View file

@ -4,14 +4,17 @@
TODO: write this section
- The Dynamo ring
- The Dynamo ring (see [this paper](https://dl.acm.org/doi/abs/10.1145/1323293.1294281) and [that paper](https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud))
- CRDTs
- CRDTs (see [this paper](https://link.springer.com/chapter/10.1007/978-3-642-24550-3_29))
- Consistency model of Garage tables
See this presentation (in French) for some first information:
<https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/doc/talks/2020-12-02_wide-team/talk.pdf>
In the meantime, you can find some information at the following links:
- [this presentation (in French)](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/doc/talks/2020-12-02_wide-team/talk.pdf)
- [an old design draft](/working_documents/design_draft.md)
## Garbage collection

View file

@ -1,4 +1,4 @@
# Related Work
# Related work
## Context
@ -55,21 +55,21 @@ We also do not classify Swift as *Simple*.
**[Ceph](https://ceph.io/ceph-storage/object-storage/):**
This review holds for the whole Ceph stack, including the RADOS paper, Ceph Object Storage module, the RADOS Gateway, etc.
At its core, Ceph has been designed to provide *POSIX/Filesystem compatibility* which requires strong consistency, which in turn
makes Ceph latency-sensitive and fails our *Internet enabled* goal.
makes Ceph latency-sensitive and fails our *Internet enabled* goal.
Due to its industry oriented design, Ceph is also far from being *Simple* to operate and from being *Self-contained & lightweight* which makes it hard to integrate it in an hyperconverged infrastructure.
In a certain way, Ceph and MinIO are closer together than they are from Garage or OpenStack Swift.
**[Pithos](https://github.com/exoscale/pithos)**
**[Pithos](https://github.com/exoscale/pithos):**
Pithos has been abandonned and should probably not used yet, in the following we explain why we did not pick their design.
Pithos was relying as a S3 proxy in front of Cassandra (and was working with Scylla DB too).
From its designers' mouth, storing data in Cassandra has shown its limitations justifying the project abandonment.
They built a closed-source version 2 that does not store blobs in the database (only metadata) but did not communicate further on it.
We considered there v2's design but concluded that it does not fit both our *Self-contained & lightweight* and *Simple* properties. It makes the development, the deployment and the operations more complicated while reducing the flexibility.
**[Riak CS](https://docs.riak.com/riak/cs/2.1.1/index.html)**
**[Riak CS](https://docs.riak.com/riak/cs/2.1.1/index.html):**
*Not written yet*
**[IPFS](https://ipfs.io/) :**
**[IPFS](https://ipfs.io/):**
*Not written yet*
## Specific research papers

View file

@ -15,60 +15,37 @@
# Data resiliency for everyone
OLD
Garage is a lightweight geo-distributed data store that implements the
[Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)
object storage protocole. It enables applications to store large blobs such
as pictures, video, images, documents, etc., in a redundant multi-node
setting. S3 is versatile enough to also be used to publish a static
website.
Garage comes from the observation that despite the numerous existing
implementation of object stores, many people have broken data management
policies (backup/replication on a single site or none at all). To promote
better data management policies, we focused on the following **desirable
properties**:
Non-goals:
- **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only.
- **Feature extensiveness**: complete implementation of the S3 API or any other API to make Garage a drop-in replacement is not targeted as it could lead to decisions impacting our desirable properties.
- **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication.
- **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment.
Use-cases:
- **[Deuxfleurs](https://deuxfleurs.fr):** Garage is used by Deuxfleurs which
is a non-profit hosting organization. Especially, it is used to host their
main website, this documentation and some of its members' blogs.
Deuxfleurs also uses Garage as their [Matrix's media
backend](https://github.com/matrix-org/synapse-s3-storage-provider).
Deuxfleurs also uses it in its continuous integration platform to store
Drone's job logs and a Nix binary cache.
ENDOLD
Garage is an **open-source** distributed **storage service** you can **self-host** to fullfill many needs.
Garage is an **open-source** distributed **storage service** you can **self-host** to fullfill many needs:
<p align="center" style="text-align:center; margin-bottom: 5rem;">
<img alt="Summary of the possible usages with a related icon: host a website, store media and backup target" src="img/usage.svg" />
</p>
Garage implements the **[Amazon S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)** and thus is already **compatible** with many applications.
<p align="center" style="text-align:center; margin-bottom: 5rem;">
<a href="/design/use_cases.html">⮞ learn more about use cases ⮜</a>
</p>
Garage implements the **[Amazon S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)** and thus is already **compatible** with many applications:
<p align="center" style="text-align:center; margin-bottom: 8rem;">
<img alt="Garage is already compatible with Nextcloud, Mastodon, Matrix Synapse, Cyberduck, RClone and Peertube" src="img/software.svg" />
</p>
<p align="center" style="text-align:center; margin-bottom: 5rem;">
<a href="/connect/index.html">⮞ learn more about integrations ⮜</a>
</p>
Garage provides **data resiliency** by **replicating** data 3x over **distant** servers.
Garage provides **data resiliency** by **replicating** data 3x over **distant** servers:
<p align="center" style="text-align:center; margin-bottom: 5rem;">
<img alt="An example deployment on a map with servers in 5 zones: UK, France, Belgium, Germany and Switzerland. Each chunk of data is replicated in 3 of these 5 zones." src="img/map.svg" />
</p>
<p align="center" style="text-align:center; margin-bottom: 5rem;">
<a href="/design/index.html">⮞ learn more about our design ⮜</a>
</p>
Did you notice that *this website* is hosted and served by Garage?
## Keeping requirements low
@ -79,6 +56,7 @@ We worked hard to keep requirements as low as possible as we target the largest
* **RAM:** 1GB
* **Disk Space:** at least 16GB
* **Network:** 200ms or less, 50 Mbps or more
* **Heterogeneous hardware:** build a cluster with whatever second-hand machines are available
*For the network, as we do not use consensus algorithms like Paxos or Raft, Garage is not as latency sensitive.*
*Thanks to Rust and its zero-cost abstractions, we keep CPU and memory low.*
@ -88,20 +66,15 @@ We worked hard to keep requirements as low as possible as we target the largest
- [Dynamo: Amazons Highly Available Key-value Store ](https://dl.acm.org/doi/abs/10.1145/1323293.1294281) by DeCandia et al.
- [Conflict-Free Replicated Data Types](https://link.springer.com/chapter/10.1007/978-3-642-24550-3_29) by Shapiro et al.
- [Maglev: A Fast and Reliable Software Network Load Balancer](https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud) by Eisenbud et al.
- [Merkle Search Trees: Efficient State-Based CRDTs in Open Networks](https://ieeexplore.ieee.org/document/9049566) by Auvolat and Taïani
## Talks
We love to talk and hear about Garage, that's why we keep a log here:
- [(fr, 2021-11-13, video) Garage : Mille et une façons de stocker vos données](https://video.tedomum.net/w/moYKcv198dyMrT8hCS5jz9) and [slides (html)](https://rfid.deuxfleurs.fr/presentations/2021-11-13/garage/) - during [RFID#1](https://rfid.deuxfleurs.fr/programme/2021-11-13/) event
- [(en, 2021-04-28, pdf) Distributed object storage is centralised](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2021-04-28_spirals-team/talk.pdf)
- [(fr, 2020-12-02, pdf) Garage : jouer dans la cour des grands quand on est un hébergeur associatif](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2020-12-02_wide-team/talk.pdf)
*Did you write or talk about Garage? [Open a pull request](https://git.deuxfleurs.fr/Deuxfleurs/garage/) to add a link here!*
## Community
If you want to discuss with us, you can join our Matrix channel at [#garage:deuxfleurs.fr](https://matrix.to/#/#garage:deuxfleurs.fr).