diff --git a/doc/book/src/SUMMARY.md b/doc/book/src/SUMMARY.md index cbf6bb705..878c20b34 100644 --- a/doc/book/src/SUMMARY.md +++ b/doc/book/src/SUMMARY.md @@ -8,7 +8,7 @@ - [Multi-node deployment](./cookbook/real_world.md) - [Building from source](./cookbook/from_source.md) - [Integration with systemd](./cookbook/systemd.md) - - [Gateways](./cookbook/gateways.md) + - [Configuring a gateway node](./cookbook/gateways.md) - [Exposing buckets as websites](./cookbook/exposing_websites.md) - [Configuring a reverse proxy](./cookbook/reverse_proxy.md) - [Recovering from failures](./cookbook/recovering.md) @@ -30,9 +30,9 @@ - [S3 compatibility status](./reference_manual/s3_compatibility.md) - [Design](./design/index.md) - - [Related Work](./design/related_work.md) + - [Goals and use Cases](./design/goals.md) + - [Related work](./design/related_work.md) - [Internals](./design/internals.md) - - [Design draft](./design/design_draft.md) - [Development](./development/index.md) - [Setup your environment](./development/devenv.md) @@ -41,5 +41,6 @@ - [Miscellaneous notes](./development/miscellaneous_notes.md) - [Working Documents](./working_documents/index.md) - - [Load Balancing Data](./working_documents/load_balancing.md) + - [Load balancing data](./working_documents/load_balancing.md) - [Migrating from 0.3 to 0.4](./working_documents/migration_04.md) + - [Design draft](./working_documents/design_draft.md) diff --git a/doc/book/src/connect/index.md b/doc/book/src/connect/index.md index 56c41255d..703b19d40 100644 --- a/doc/book/src/connect/index.md +++ b/doc/book/src/connect/index.md @@ -1,7 +1,19 @@ -# Connect it to +# Connect it to... -To configure an S3 client to interact with Garage, you will need the following -parameters: +Garage implements the Amazon S3 protocol, which makes it compatible with many existing software programs. + +In particular, you will find here instructions to connect it with: + + - [web applications](./apps.md) + - [website hosting](./websites.md) + - [software repositories](./repositories.md) + - [CLI tools](./cli.md) + - [your own code](./code.md) + +### Generic instructions + +To configure S3-compatible software to interact with Garage, +you will need the following parameters: - An **API endpoint**: this corresponds to the HTTP or HTTPS address used to contact the Garage server. When runing Garage locally this will usually @@ -27,12 +39,3 @@ provided that you follow the following guidelines: If this is not configured explicitly, clients usually try to talk to region `us-east-1`. Garage should normally redirect your client to the correct region, but in case your client does not support this you might have to configure it manually. - -We will now provide example configurations for the most common clients per category: - - - [Apps](./apps.md) - - [Websites](./websites.md) - - [Repositories](./repositories.md) - - [CLI tools](./cli.md) - - [Your code](./code.md) - diff --git a/doc/book/src/cookbook/gateways.md b/doc/book/src/cookbook/gateways.md index 7b286b65c..f03671a40 100644 --- a/doc/book/src/cookbook/gateways.md +++ b/doc/book/src/cookbook/gateways.md @@ -6,11 +6,11 @@ Gateways allow you to expose Garage endpoints (S3 API and websites) without stor You can configure Garage as a gateway on all nodes that will consume your S3 API, it will provide you the following benefits: - - **It removes 1 or 2 network RTT** Instead of (querying your reverse proxy then) querying a random node of the cluster that will forward your request to the nodes effectively storing the data, your local gateway will directly knows which node to query. + - **It removes 1 or 2 network RTT.** Instead of (querying your reverse proxy then) querying a random node of the cluster that will forward your request to the nodes effectively storing the data, your local gateway will directly knows which node to query. - - **It ease server management** Instead of tracking in your reverse proxy and DNS what are the current Garage nodes, your gateway being part of the cluster keeps this information for you. In your software, you will always specify `http://localhost:3900`. + - **It eases server management.** Instead of tracking in your reverse proxy and DNS what are the current Garage nodes, your gateway being part of the cluster keeps this information for you. In your software, you will always specify `http://localhost:3900`. - - **It simplifies security** Instead of having to maintain and renew a TLS certificate, you leverage the Secret Handshake protocol we use for our cluster. The S3 API protocol will be in plain text but limited to your local machine. + - **It simplifies security.** Instead of having to maintain and renew a TLS certificate, you leverage the Secret Handshake protocol we use for our cluster. The S3 API protocol will be in plain text but limited to your local machine. ## Limitations diff --git a/doc/book/src/cookbook/index.md b/doc/book/src/cookbook/index.md index da915f853..792a5e6ec 100644 --- a/doc/book/src/cookbook/index.md +++ b/doc/book/src/cookbook/index.md @@ -4,22 +4,23 @@ A cookbook, when you cook, is a collection of recipes. Similarly, Garage's cookbook contains a collection of recipes that are known to works well! This chapter could also be referred as "Tutorials" or "Best practices". -- **[Deploying Garage](real_world.md):** This page will walk you through all of the necessary +- **[Multi-node deployment](real_world.md):** This page will walk you through all of the necessary steps to deploy Garage in a real-world setting. -- **[Configuring S3 clients](clients.md):** This page will explain how to configure - popular S3 clients to interact with a Garage server. - -- **[Hosting a website](website.md):** This page explains how to use Garage - to host a static website. - -- **[Recovering from failures](recovering.md):** Garage's first selling point is resilience - to hardware failures. This section explains how to recover from such a failure in the - best possible way. - - **[Building from source](from_source.md):** This page explains how to build Garage from source in case a binary is not provided for your architecture, or if you want to hack with us! -- **[Starting with Systemd](from_source.md):** This page explains how to run Garage +- **[Integration with Systemd](systemd.md):** This page explains how to run Garage as a Systemd service (instead of as a Docker container). + +- **[Configuring a gateway node](gateways.md):** This page explains how to run a gateway node in a Garage cluster, i.e. a Garage node that doesn't store data but accelerates access to data present on the other nodes. + +- **[Hosting a website](exposing_websites.md):** This page explains how to use Garage + to host a static website. + +- **[Configuring a reverse-proxy](reverse_proxy.md):** This page explains how to configure a reverse-proxy to add TLS support to your S3 api endpoint. + +- **[Recovering from failures](recovering.md):** Garage's first selling point is resilience + to hardware failures. This section explains how to recover from such a failure in the + best possible way. diff --git a/doc/book/src/cookbook/real_world.md b/doc/book/src/cookbook/real_world.md index 4b3fec2b7..d1303d479 100644 --- a/doc/book/src/cookbook/real_world.md +++ b/doc/book/src/cookbook/real_world.md @@ -286,5 +286,5 @@ and is covered in the [quick start guide](../quick_start/index.md). Remember also that the CLI is self-documented thanks to the `--help` flag and the `help` subcommand (e.g. `garage help`, `garage key --help`). -Configuring an S3 client to interact with Garage is covered -[in the next section](clients.md). +Configuring S3-compatible applicatiosn to interact with Garage +is covered in the [Integrations](/connect/index.html) section. diff --git a/doc/book/src/design/goals.md b/doc/book/src/design/goals.md new file mode 100644 index 000000000..10ef6a8f4 --- /dev/null +++ b/doc/book/src/design/goals.md @@ -0,0 +1,53 @@ +# Goals and use cases + +## Goals and non-goals + +Garage is a lightweight geo-distributed data store that implements the +[Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) +object storage protocole. It enables applications to store large blobs such +as pictures, video, images, documents, etc., in a redundant multi-node +setting. S3 is versatile enough to also be used to publish a static +website. + +Garage is an opinionated object storage solutoin, we focus on the following **desirable properties**: + + - **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target [hyperconverged infrastructures](https://en.wikipedia.org/wiki/Hyper-converged_infrastructure). + - **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures. + - **Simple**: simple to understand, simple to operate, simple to debug. + - **Internet enabled**: made for multi-sites (eg. datacenters, offices, households, etc.) interconnected through regular Internet connections. + +We also noted that the pursuit of some other goals are detrimental to our initial goals. +The following has been identified as **non-goals** (if these points matter to you, you should not use Garage): + + - **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only. + - **Feature extensiveness**: we do not plan to add additional features compared to the ones provided by the S3 API. + - **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication. + - **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment. + +## Use-cases + +*Are you also using Garage in your organization? [Open a PR](https://git.deuxfleurs.fr/Deuxfleurs/garage) to add your use case here!* + +### Deuxfleurs + +[Deuxfleurs](https://deuxfleurs.fr) is an experimental non-profit hosting +organization that develops Garage. Deuxfleurs is focused on building highly +available infrastructure through redundancy in multiple geographical +locations. They use Garage themselves for the following tasks: + +- Hosting of [main website](https://deuxfleurs.fr), [this website](https://garagehq.deuxfleurs.fr), as well as the personal website of many of the members of the organization + +- As a [Matrix media backend](https://github.com/matrix-org/synapse-s3-storage-provider) + +- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 proxy + +- In the Drone continuous integration platform to store task logs + +- As a Nix binary cache + +- As a backup target using `rclone` + +The Deuxfleurs Garage cluster is a multi-site cluster currently composed of +4 nodes in 2 physical locations. In the future it will be expanded to at +least 3 physical locations to fully exploit Garage's potential for high +availability. diff --git a/doc/book/src/design/index.md b/doc/book/src/design/index.md index 305f05019..2e3b5fd93 100644 --- a/doc/book/src/design/index.md +++ b/doc/book/src/design/index.md @@ -1,30 +1,22 @@ # Design -The design section helps you to see Garage from a "big picture" perspective. -It will allow you to understand if Garage is a good fit for you, -how to better use it, how to contribute to it, what can Garage could and could not do, etc. +The design section helps you to see Garage from a "big picture" +perspective. It will allow you to understand if Garage is a good fit for +you, how to better use it, how to contribute to it, what can Garage could +and could not do, etc. -## Goals and non-goals +- **[Goals and use cases](goals.md):** This page explains why Garage was concieved and what practical use cases it targets. -Garage is an opinionated object storage solutoin, we focus on the following **desirable properties**: +- **[Related work](related_work.md):** This pages presents the theoretical background on which Garage is built, and describes other software storage solutions and why they didn't work for us. - - **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target [hyperconverged infrastructures](https://en.wikipedia.org/wiki/Hyper-converged_infrastructure). - - **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures. - - **Simple**: simple to understand, simple to operate, simple to debug. - - **Internet enabled**: made for multi-sites (eg. datacenters, offices, households, etc.) interconnected through regular Internet connections. - -We also noted that the pursuit of some other goals are detrimental to our initial goals. -The following has been identified as **non-goals** (if these points matter to you, you should not use Garage): - - - **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only. - - **Feature extensiveness**: we do not plan to add additional features compared to the ones provided by the S3 API. - - **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication. - - **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment. +- **[Internals](internals.md):** This page enters into more details on how Garage manages data internally. ## Talks We love to talk and hear about Garage, that's why we keep a log here: + - [(fr, 2021-11-13, video) Garage : Mille et une façons de stocker vos données](https://video.tedomum.net/w/moYKcv198dyMrT8hCS5jz9) and [slides (html)](https://rfid.deuxfleurs.fr/presentations/2021-11-13/garage/) - during [RFID#1](https://rfid.deuxfleurs.fr/programme/2021-11-13/) event + - [(en, 2021-04-28) Distributed object storage is centralised](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2021-04-28_spirals-team/talk.pdf) - [(fr, 2020-12-02) Garage : jouer dans la cour des grands quand on est un hébergeur associatif](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2020-12-02_wide-team/talk.pdf) diff --git a/doc/book/src/design/internals.md b/doc/book/src/design/internals.md index 255335fac..0b31584c1 100644 --- a/doc/book/src/design/internals.md +++ b/doc/book/src/design/internals.md @@ -4,14 +4,17 @@ TODO: write this section -- The Dynamo ring +- The Dynamo ring (see [this paper](https://dl.acm.org/doi/abs/10.1145/1323293.1294281) and [that paper](https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud)) -- CRDTs +- CRDTs (see [this paper](https://link.springer.com/chapter/10.1007/978-3-642-24550-3_29)) - Consistency model of Garage tables -See this presentation (in French) for some first information: - +In the meantime, you can find some information at the following links: + +- [this presentation (in French)](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/doc/talks/2020-12-02_wide-team/talk.pdf) + +- [an old design draft](/working_documents/design_draft.md) ## Garbage collection diff --git a/doc/book/src/design/related_work.md b/doc/book/src/design/related_work.md index aaf10d7b5..da3f807ee 100644 --- a/doc/book/src/design/related_work.md +++ b/doc/book/src/design/related_work.md @@ -1,4 +1,4 @@ -# Related Work +# Related work ## Context @@ -55,21 +55,21 @@ We also do not classify Swift as *Simple*. **[Ceph](https://ceph.io/ceph-storage/object-storage/):** This review holds for the whole Ceph stack, including the RADOS paper, Ceph Object Storage module, the RADOS Gateway, etc. At its core, Ceph has been designed to provide *POSIX/Filesystem compatibility* which requires strong consistency, which in turn -makes Ceph latency-sensitive and fails our *Internet enabled* goal. +makes Ceph latency-sensitive and fails our *Internet enabled* goal. Due to its industry oriented design, Ceph is also far from being *Simple* to operate and from being *Self-contained & lightweight* which makes it hard to integrate it in an hyperconverged infrastructure. In a certain way, Ceph and MinIO are closer together than they are from Garage or OpenStack Swift. -**[Pithos](https://github.com/exoscale/pithos)** +**[Pithos](https://github.com/exoscale/pithos):** Pithos has been abandonned and should probably not used yet, in the following we explain why we did not pick their design. Pithos was relying as a S3 proxy in front of Cassandra (and was working with Scylla DB too). From its designers' mouth, storing data in Cassandra has shown its limitations justifying the project abandonment. They built a closed-source version 2 that does not store blobs in the database (only metadata) but did not communicate further on it. We considered there v2's design but concluded that it does not fit both our *Self-contained & lightweight* and *Simple* properties. It makes the development, the deployment and the operations more complicated while reducing the flexibility. -**[Riak CS](https://docs.riak.com/riak/cs/2.1.1/index.html)** +**[Riak CS](https://docs.riak.com/riak/cs/2.1.1/index.html):** *Not written yet* -**[IPFS](https://ipfs.io/) :** +**[IPFS](https://ipfs.io/):** *Not written yet* ## Specific research papers diff --git a/doc/book/src/intro.md b/doc/book/src/intro.md index 746f4d6a8..10f9c0a2a 100644 --- a/doc/book/src/intro.md +++ b/doc/book/src/intro.md @@ -15,60 +15,37 @@ # Data resiliency for everyone -OLD - -Garage is a lightweight geo-distributed data store that implements the -[Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) -object storage protocole. It enables applications to store large blobs such -as pictures, video, images, documents, etc., in a redundant multi-node -setting. S3 is versatile enough to also be used to publish a static -website. - -Garage comes from the observation that despite the numerous existing -implementation of object stores, many people have broken data management -policies (backup/replication on a single site or none at all). To promote -better data management policies, we focused on the following **desirable -properties**: - -Non-goals: - - - **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only. - - **Feature extensiveness**: complete implementation of the S3 API or any other API to make Garage a drop-in replacement is not targeted as it could lead to decisions impacting our desirable properties. - - **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication. - - **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment. - -Use-cases: - -- **[Deuxfleurs](https://deuxfleurs.fr):** Garage is used by Deuxfleurs which - is a non-profit hosting organization. Especially, it is used to host their - main website, this documentation and some of its members' blogs. - Deuxfleurs also uses Garage as their [Matrix's media - backend](https://github.com/matrix-org/synapse-s3-storage-provider). - Deuxfleurs also uses it in its continuous integration platform to store - Drone's job logs and a Nix binary cache. - -ENDOLD - - -Garage is an **open-source** distributed **storage service** you can **self-host** to fullfill many needs. +Garage is an **open-source** distributed **storage service** you can **self-host** to fullfill many needs:

Summary of the possible usages with a related icon: host a website, store media and backup target

-Garage implements the **[Amazon S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)** and thus is already **compatible** with many applications. +

+⮞ learn more about use cases ⮜ +

+ +Garage implements the **[Amazon S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html)** and thus is already **compatible** with many applications:

Garage is already compatible with Nextcloud, Mastodon, Matrix Synapse, Cyberduck, RClone and Peertube

+

+⮞ learn more about integrations ⮜ +

-Garage provides **data resiliency** by **replicating** data 3x over **distant** servers. + +Garage provides **data resiliency** by **replicating** data 3x over **distant** servers:

An example deployment on a map with servers in 5 zones: UK, France, Belgium, Germany and Switzerland. Each chunk of data is replicated in 3 of these 5 zones.

+

+⮞ learn more about our design ⮜ +

+ Did you notice that *this website* is hosted and served by Garage? ## Keeping requirements low @@ -79,6 +56,7 @@ We worked hard to keep requirements as low as possible as we target the largest * **RAM:** 1GB * **Disk Space:** at least 16GB * **Network:** 200ms or less, 50 Mbps or more + * **Heterogeneous hardware:** build a cluster with whatever second-hand machines are available *For the network, as we do not use consensus algorithms like Paxos or Raft, Garage is not as latency sensitive.* *Thanks to Rust and its zero-cost abstractions, we keep CPU and memory low.* @@ -88,20 +66,15 @@ We worked hard to keep requirements as low as possible as we target the largest - [Dynamo: Amazon’s Highly Available Key-value Store ](https://dl.acm.org/doi/abs/10.1145/1323293.1294281) by DeCandia et al. - [Conflict-Free Replicated Data Types](https://link.springer.com/chapter/10.1007/978-3-642-24550-3_29) by Shapiro et al. - [Maglev: A Fast and Reliable Software Network Load Balancer](https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud) by Eisenbud et al. - - [Merkle Search Trees: Efficient State-Based CRDTs in Open Networks](https://ieeexplore.ieee.org/document/9049566) by Auvolat and Taïani ## Talks -We love to talk and hear about Garage, that's why we keep a log here: - - [(fr, 2021-11-13, video) Garage : Mille et une façons de stocker vos données](https://video.tedomum.net/w/moYKcv198dyMrT8hCS5jz9) and [slides (html)](https://rfid.deuxfleurs.fr/presentations/2021-11-13/garage/) - during [RFID#1](https://rfid.deuxfleurs.fr/programme/2021-11-13/) event - [(en, 2021-04-28, pdf) Distributed object storage is centralised](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2021-04-28_spirals-team/talk.pdf) - [(fr, 2020-12-02, pdf) Garage : jouer dans la cour des grands quand on est un hébergeur associatif](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2020-12-02_wide-team/talk.pdf) -*Did you write or talk about Garage? [Open a pull request](https://git.deuxfleurs.fr/Deuxfleurs/garage/) to add a link here!* - ## Community If you want to discuss with us, you can join our Matrix channel at [#garage:deuxfleurs.fr](https://matrix.to/#/#garage:deuxfleurs.fr). diff --git a/doc/book/src/design/design_draft.md b/doc/book/src/working_documents/design_draft.md similarity index 100% rename from doc/book/src/design/design_draft.md rename to doc/book/src/working_documents/design_draft.md