From f6aebefcc9747bf5afad3767e9ae6f9f3aba30ae Mon Sep 17 00:00:00 2001 From: Alex Auvolat Date: Wed, 14 Sep 2022 19:31:13 +0200 Subject: [PATCH] Some work on documentation towards v0.8 --- doc/book/design/benchmarks/index.md | 2 +- doc/book/design/goals.md | 4 +- doc/book/design/internals.md | 43 ++++++++++ doc/book/design/related-work.md | 2 +- doc/book/quick-start/_index.md | 9 ++ doc/book/reference-manual/admin-api.md | 2 +- doc/book/reference-manual/cli.md | 2 +- doc/book/reference-manual/configuration.md | 14 +-- doc/book/reference-manual/features.md | 85 +++++++++++++++++++ doc/book/reference-manual/k2v.md | 2 +- doc/book/reference-manual/layout.md | 2 +- doc/book/reference-manual/routing.md | 45 ---------- doc/book/reference-manual/s3-compatibility.md | 2 +- doc/book/working-documents/design-draft.md | 4 +- doc/book/working-documents/load-balancing.md | 4 +- 15 files changed, 151 insertions(+), 71 deletions(-) create mode 100644 doc/book/reference-manual/features.md delete mode 100644 doc/book/reference-manual/routing.md diff --git a/doc/book/design/benchmarks/index.md b/doc/book/design/benchmarks/index.md index c2215a4a..79cc5d62 100644 --- a/doc/book/design/benchmarks/index.md +++ b/doc/book/design/benchmarks/index.md @@ -1,6 +1,6 @@ +++ title = "Benchmarks" -weight = 10 +weight = 40 +++ With Garage, we wanted to build a software defined storage service that follow the [KISS principle](https://en.wikipedia.org/wiki/KISS_principle), diff --git a/doc/book/design/goals.md b/doc/book/design/goals.md index dea1d2c8..b97d73a9 100644 --- a/doc/book/design/goals.md +++ b/doc/book/design/goals.md @@ -1,13 +1,13 @@ +++ title = "Goals and use cases" -weight = 5 +weight = 10 +++ ## Goals and non-goals Garage is a lightweight geo-distributed data store that implements the [Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) -object storage protocole. It enables applications to store large blobs such +object storage protocol. It enables applications to store large blobs such as pictures, video, images, documents, etc., in a redundant multi-node setting. S3 is versatile enough to also be used to publish a static website. diff --git a/doc/book/design/internals.md b/doc/book/design/internals.md index 05d852e2..777e017d 100644 --- a/doc/book/design/internals.md +++ b/doc/book/design/internals.md @@ -20,6 +20,49 @@ In the meantime, you can find some information at the following links: - [an old design draft](@/documentation/working-documents/design-draft.md) +## Request routing logic + +Data retrieval requests to Garage endpoints (S3 API and websites) are resolved +to an individual object in a bucket. Since objects are replicated to multiple nodes +Garage must ensure consistency before answering the request. + +### Using quorum to ensure consistency + +Garage ensures consistency by attempting to establish a quorum with the +data nodes responsible for the object. When a majority of the data nodes +have provided metadata on a object Garage can then answer the request. + +When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions: + +- Make a request to the two preferred nodes for object metadata +- Try the third node if one of the two initial requests fail +- Check that the metadata from at least 2 nodes match +- Check that the object hasn't been marked deleted +- Answer the request with inline data from metadata if object is small enough +- Or get data blocks from the preferred nodes and answer using the assembled object + +Garage dynamically determines which nodes to query based on health, preference, and +which nodes actually host a given data. Garage has no concept of "primary" so any +healthy node with the data can be used as long as a quorum is reached for the metadata. + +### Node health + +Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection +cannot be established, or a node fails to answer a number of pings, the target node is marked as failed. +Failed nodes are not used for quorum or other internal requests. + +### Node preference + +Garage prioritizes which nodes to query according to a few criteria: + +- A node always prefers itself if it can answer the request +- Then the node prioritizes nodes in the same zone +- Finally the nodes with the lowest latency are prioritized + + +For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md) +and [cluster layout management](@/documentation/reference-manual/layout.md) pages. + ## Garbage collection A faulty garbage collection procedure has been the cause of diff --git a/doc/book/design/related-work.md b/doc/book/design/related-work.md index ade298ec..f96c6618 100644 --- a/doc/book/design/related-work.md +++ b/doc/book/design/related-work.md @@ -1,6 +1,6 @@ +++ title = "Related work" -weight = 15 +weight = 50 +++ ## Context diff --git a/doc/book/quick-start/_index.md b/doc/book/quick-start/_index.md index 5d7df48e..21331dcb 100644 --- a/doc/book/quick-start/_index.md +++ b/doc/book/quick-start/_index.md @@ -9,6 +9,15 @@ Let's start your Garage journey! In this chapter, we explain how to deploy Garage as a single-node server and how to interact with it. +## What is Garage? + +Before jumping in, you might be interested in reading the following pages: + +- [Goals and use cases](@/documentation/design/goals.md) +- [List of features](@/documentation/reference-manual/features.md) + +## Scope of this tutorial + Our goal is to introduce you to Garage's workflows. Following this guide is recommended before moving on to [configuring a multi-node cluster](@/documentation/cookbook/real-world.md). diff --git a/doc/book/reference-manual/admin-api.md b/doc/book/reference-manual/admin-api.md index c7316cdf..3a4a7aab 100644 --- a/doc/book/reference-manual/admin-api.md +++ b/doc/book/reference-manual/admin-api.md @@ -1,6 +1,6 @@ +++ title = "Administration API" -weight = 16 +weight = 60 +++ The Garage administration API is accessible through a dedicated server whose diff --git a/doc/book/reference-manual/cli.md b/doc/book/reference-manual/cli.md index 43a0c823..82492c3e 100644 --- a/doc/book/reference-manual/cli.md +++ b/doc/book/reference-manual/cli.md @@ -1,6 +1,6 @@ +++ title = "Garage CLI" -weight = 15 +weight = 30 +++ The Garage CLI is mostly self-documented. Make use of the `help` subcommand diff --git a/doc/book/reference-manual/configuration.md b/doc/book/reference-manual/configuration.md index 65381f46..6db12568 100644 --- a/doc/book/reference-manual/configuration.md +++ b/doc/book/reference-manual/configuration.md @@ -1,6 +1,6 @@ +++ title = "Configuration file format" -weight = 5 +weight = 20 +++ Here is an example `garage.toml` configuration file that illustrates all of the possible options: @@ -10,7 +10,6 @@ metadata_dir = "/var/lib/garage/meta" data_dir = "/var/lib/garage/data" block_size = 1048576 -block_manager_background_tranquility = 2 replication_mode = "3" @@ -87,17 +86,6 @@ files will remain available. This however means that chunks from existing files will not be deduplicated with chunks from newly uploaded files, meaning you might use more storage space that is optimally possible. -### `block_manager_background_tranquility` - -This parameter tunes the activity of the background worker responsible for -resyncing data blocks between nodes. The higher the tranquility value is set, -the more the background worker will wait between iterations, meaning the load -on the system (including network usage between nodes) will be reduced. The -minimal value for this parameter is `0`, where the background worker will -allways work at maximal throughput to resynchronize blocks. The default value -is `2`, where the background worker will try to spend at most 1/3 of its time -working, and 2/3 sleeping in order to reduce system load. - ### `replication_mode` Garage supports the following replication modes: diff --git a/doc/book/reference-manual/features.md b/doc/book/reference-manual/features.md new file mode 100644 index 00000000..23750800 --- /dev/null +++ b/doc/book/reference-manual/features.md @@ -0,0 +1,85 @@ ++++ +title = "List of Garage features" +weight = 10 ++++ + + +### S3 API + +The main goal of Garage is to provide an object storage service that is compatible with the +[S3 API](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) from Amazon Web Services. +We try to adhere as strictly as possible to the semantics of the API as implemented by Amazon +and other vendors such as Minio or CEPH. + +Of course Garage does not implement the full span of API endpoints that AWS S3 does; +the exact list of S3 features implemented by Garage can be found [on our S3 compatibility page](@/documentation/reference-manual/s3-compatibility.md). + +### Geo-distribution + +Garage allows you to store copies of your data in multiple geographical locations in order to maximize resilience +to adverse events, such as network/power outages or hardware failures. +This allows Garage to run very well even at home, using consumer-grade Internet connectivity +(such as FTTH) and power, as long as cluster nodes can be spawned at several physical locations. +Garage exploits knowledge of the capacity and physical location of each storage node to design +a storage plan that best exploits the available storage capacity while satisfying the geo-distributed replication constraint. + +To learn more about geo-distributed Garage clusters, +read our documentation on [setting up a real-world deployment](@/documentation/cookbook/real-world.md). + +### Flexible topology + +A Garage cluster can very easily evolve over time, as storage nodes are added or removed. +Garage will automatically rebalance data between nodes as needed to ensure the desired number of copies. +Read about cluster layout management [here](@/documentation/reference-manual/layout.md). + +### No RAFT slowing you down + +It might seem strange to tout the absence of something as a desirable feature, +but this is in fact a very important point! Garage does not use RAFT or another +consensus algorithm internally to order incoming requests: this means that all requests +directed to a Garage cluster can be handled independently of one another instead +of going through a central bottleneck (the leader node). +As a consequence, requests can be handled much faster, even in cases where latency +between cluster nodes is important (see our [benchmarks](@/documentation/design/benchmarks/index.md) for data on this). +This is particularly usefull when nodes are far from one another and talk to one other through standard Internet connections. + +### Several replication modes + +Garage supports a variety of replication modes, with 1 copy, 2 copies or 3 copies of your data, +and with various levels of consistency. +Read our reference page on [supported replication modes](@/documentation/reference-manual/configuration.md#replication-mode) +to select the replication mode best suited to your use case (hint: in most cases, `replication_mode = "3"` is what you want). + +### Web server for static websites + +A storage bucket can easily be configured to be served directly by Garage as a static web site. +Domain names for multiple websites directly map to bucket names, making it easy to build +a platform for your user's to autonomously build and host their websites over Garage. +Surprisingly, none of the other alternative S3 implementations we surveyed (such as Minio +or CEPH) support publishing static websites from S3 buckets, a feature that is however +directly inherited from S3 on AWS. + +### Bucket names as aliases + + - the same bucket may have multiple names (useful when exposing websites for example) + + - bucket renaming is possible + + - Scoped buckets: 2 users can have a different bucket with the same name -> avoid collision. Helpful if you want to write an application that creates per-user bucket always with the same name. + +### Standalone/self contained + + +### Integration with Kubernetes and Nomad + +Many node discovery methods: Kubernetes integration, Nomad integration through Consul + +### Support for changing IP addresses + +(as long as all nodes don't change their IP at the same time) + +### Cluster administration API + +### Metrics and traces + +### (experimental) K2V API diff --git a/doc/book/reference-manual/k2v.md b/doc/book/reference-manual/k2v.md index 742e4309..207d056a 100644 --- a/doc/book/reference-manual/k2v.md +++ b/doc/book/reference-manual/k2v.md @@ -1,6 +1,6 @@ +++ title = "K2V" -weight = 30 +weight = 70 +++ Starting with version 0.7.2, Garage introduces an optionnal feature, K2V, diff --git a/doc/book/reference-manual/layout.md b/doc/book/reference-manual/layout.md index 7debbf33..a7d6f51f 100644 --- a/doc/book/reference-manual/layout.md +++ b/doc/book/reference-manual/layout.md @@ -1,6 +1,6 @@ +++ title = "Cluster layout management" -weight = 10 +weight = 50 +++ The cluster layout in Garage is a table that assigns to each node a role in diff --git a/doc/book/reference-manual/routing.md b/doc/book/reference-manual/routing.md deleted file mode 100644 index aec637cc..00000000 --- a/doc/book/reference-manual/routing.md +++ /dev/null @@ -1,45 +0,0 @@ -+++ -title = "Request routing logic" -weight = 10 -+++ - -Data retrieval requests to Garage endpoints (S3 API and websites) are resolved -to an individual object in a bucket. Since objects are replicated to multiple nodes -Garage must ensure consistency before answering the request. - -## Using quorum to ensure consistency - -Garage ensures consistency by attempting to establish a quorum with the -data nodes responsible for the object. When a majority of the data nodes -have provided metadata on a object Garage can then answer the request. - -When a request arrives Garage will, assuming the recommended 3 replicas, perform the following actions: - -- Make a request to the two preferred nodes for object metadata -- Try the third node if one of the two initial requests fail -- Check that the metadata from at least 2 nodes match -- Check that the object hasn't been marked deleted -- Answer the request with inline data from metadata if object is small enough -- Or get data blocks from the preferred nodes and answer using the assembled object - -Garage dynamically determines which nodes to query based on health, preference, and -which nodes actually host a given data. Garage has no concept of "primary" so any -healthy node with the data can be used as long as a quorum is reached for the metadata. - -## Node health - -Garage keeps a TCP session open to each node in the cluster and periodically pings them. If a connection -cannot be established, or a node fails to answer a number of pings, the target node is marked as failed. -Failed nodes are not used for quorum or other internal requests. - -## Node preference - -Garage prioritizes which nodes to query according to a few criteria: - -- A node always prefers itself if it can answer the request -- Then the node prioritizes nodes in the same zone -- Finally the nodes with the lowest latency are prioritized - - -For further reading on the cluster structure look at the [gateway](@/documentation/cookbook/gateways.md) -and [cluster layout management](@/documentation/reference-manual/layout.md) pages. \ No newline at end of file diff --git a/doc/book/reference-manual/s3-compatibility.md b/doc/book/reference-manual/s3-compatibility.md index 3d571264..dd3492a0 100644 --- a/doc/book/reference-manual/s3-compatibility.md +++ b/doc/book/reference-manual/s3-compatibility.md @@ -1,6 +1,6 @@ +++ title = "S3 Compatibility status" -weight = 20 +weight = 40 +++ ## DISCLAIMER diff --git a/doc/book/working-documents/design-draft.md b/doc/book/working-documents/design-draft.md index 44849a41..3c8298b0 100644 --- a/doc/book/working-documents/design-draft.md +++ b/doc/book/working-documents/design-draft.md @@ -1,6 +1,6 @@ +++ -title = "Design draft" -weight = 25 +title = "Design draft (obsolete)" +weight = 50 +++ **WARNING: this documentation is a design draft which was written before Garage's actual implementation. diff --git a/doc/book/working-documents/load-balancing.md b/doc/book/working-documents/load-balancing.md index 87298ae6..bf6bdd95 100644 --- a/doc/book/working-documents/load-balancing.md +++ b/doc/book/working-documents/load-balancing.md @@ -1,6 +1,6 @@ +++ -title = "Load balancing data" -weight = 10 +title = "Load balancing data (obsolete)" +weight = 60 +++ **This is being yet improved in release 0.5. The working document has not been updated yet, it still only applies to Garage 0.2 through 0.4.**