diff --git a/README.md b/README.md index 6f462493..5a291bbf 100644 --- a/README.md +++ b/README.md @@ -18,119 +18,4 @@ Non-goals include: Our main use case is to provide a distributed storage layer for small-scale self hosted services such as [Deuxfleurs](https://deuxfleurs.fr). -Check our [compatibility page](doc/Compatibility.md) to view details of the S3 API compatibility. - -## Development - -We propose the following quickstart to setup a full dev. environment as quickly as possible: - - 1. Setup a rust/cargo environment. eg. `dnf install rust cargo` - 2. Install awscli v2 by following the guide [here](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html). - 3. Run `cargo build` to build the project - 4. Run `./script/dev-cluster.sh` to launch a test cluster (feel free to read the script) - 5. Run `./script/dev-configure.sh` to configure your test cluster with default values (same datacenter, 100 tokens) - 6. Run `./script/dev-bucket.sh` to create a bucket named `eprouvette` and an API key that will be stored in `/tmp/garage.s3` - 7. Run `source ./script/dev-env-aws.sh` to configure your CLI environment - 8. You can use `garage` to manage the cluster. Try `garage --help`. - 9. You can use the `awsgrg` alias to add, remove, and delete files. Try `awsgrg help`, `awsgrg cp /proc/cpuinfo s3://eprouvette/cpuinfo.txt`, or `awsgrg ls s3://eprouvette`. `awsgrg` is a wrapper on the `aws s3` command pre-configured with the previously generated API key (the one in `/tmp/garage.s3`) and localhost as the endpoint. - -Now you should be ready to start hacking on garage! - -## S3 compatibility - -Only a subset of S3 is supported: adding, listing, getting and deleting files in a bucket. -Bucket management, ACL and other advanced features are not (yet?) handled through the S3 API but through the `garage` CLI. -We primarily test `garage` against the `awscli` tool and `nextcloud`. - -## Setting up Garage - -Use the `genkeys.sh` script to generate TLS keys for encrypting communications between Garage nodes. -The script takes no arguments and will generate keys in `pki/`. -This script creates a certificate authority `garage-ca` which signs certificates for individual Garage nodes. -Garage nodes from a same cluster authenticate themselves by verifying that they have certificates signed by the same certificate authority. - -Garage requires two locations to store its data: a metadata directory, and a data directory. -The metadata directory is used to store metadata such as object lists, and should ideally be located on an SSD drive. -The data directory is used to store the chunks of data of the objects stored in Garage. -In a typical deployment the data directory is stored on a standard HDD. - -Garage does not handle TLS for its S3 API endpoint. This should be handled by adding a reverse proxy. - -Create a configuration file with the following structure: - -``` -block_size = 1048576 # objects are split in blocks of maximum this number of bytes - -metadata_dir = "/path/to/ssd/metadata/directory" -data_dir = "/path/to/hdd/data/directory" - -rpc_bind_addr = "[::]:3901" # the port other Garage nodes will use to talk to this node - -bootstrap_peers = [ - # Ideally this list should contain the IP addresses of all other Garage nodes of the cluster. - # Use Ansible or any kind of configuration templating to generate this automatically. - "10.0.0.1:3901", - "10.0.0.2:3901", - "10.0.0.3:3901", -] - -# optionnal: garage can find cluster nodes automatically using a Consul server -# garage only does lookup but does not register itself, registration should be handled externally by e.g. Nomad -consul_host = "localhost:8500" # optionnal: host name of a Consul server for automatic peer discovery -consul_service_name = "garage" # optionnal: service name to look up on Consul - -max_concurrent_rpc_requests = 12 -data_replication_factor = 3 -meta_replication_factor = 3 -meta_epidemic_fanout = 3 - -[rpc_tls] -# NOT RECOMMENDED: you can skip this section if you don't want to encrypt intra-cluster traffic -# Thanks to genkeys.sh, generating the keys and certificates is easy, so there is NO REASON NOT TO DO IT. -ca_cert = "/path/to/garage/pki/garage-ca.crt" -node_cert = "/path/to/garage/pki/garage.crt" -node_key = "/path/to/garage/pki/garage.key" - -[s3_api] -api_bind_addr = "[::1]:3900" # the S3 API port, HTTP without TLS. Add a reverse proxy for the TLS part. -s3_region = "garage" # set this to anything. S3 API calls will fail if they are not made against the region set here. - -[s3_web] -bind_addr = "[::1]:3902" -root_domain = ".garage.tld" -index = "index.html" -``` - -Build Garage using `cargo build --release`. -Then, run it using either `./target/release/garage server -c path/to/config_file.toml` or `cargo run --release -- server -c path/to/config_file.toml`. - -Set the `RUST_LOG` environment to `garage=debug` to dump some debug information. -Set it to `garage=trace` to dump even more debug information. -Set it to `garage=warn` to show nothing except warnings and errors. - -## Setting up cluster nodes - -Once all your `garage` nodes are running, you will need to: - -1. check that they are correctly talking to one another; -2. configure them with their physical location (in the case of a multi-dc deployment) and a number of "ring tokens" proportionnal to the storage space available on each node; -3. create some S3 API keys and buckets; -4. ???; -5. profit! - -To run these administrative tasks, you will need to use the `garage` command line tool and it to connect to any of the cluster's nodes on the RPC port. -The `garage` CLI also needs TLS keys and certificates of its own to authenticate and be authenticated in the cluster. -A typicall invocation will be as follows: - -``` -./target/release/garage --ca-cert=pki/garage-ca.crt --client-cert=pki/garage-client.crt --client-key=pki/garage-client.key <...> -``` - - -## Notes to self - -### What to repair - -- `tables`: to do a full sync of metadata, should not be necessary because it is done every hour by the system -- `versions` and `block_refs`: very time consuming, usefull if deletions have not been propagated, improves garbage collection -- `blocks`: very usefull to resync/rebalance blocks betweeen nodes +**[Go to the documentation](https://garagehq.deuxfleurs.fr)** diff --git a/doc/Quickstart.md b/doc/Quickstart.md deleted file mode 100644 index 6d0993a4..00000000 --- a/doc/Quickstart.md +++ /dev/null @@ -1,140 +0,0 @@ -# Quickstart on an existing deployment - -First, chances are that your garage deployment is secured by TLS. -All your commands must be prefixed with their certificates. -I will define an alias once and for all to ease future commands. -Please adapt the path of the binary and certificates to your installation! - -``` -alias grg="/garage/garage --ca-cert /secrets/garage-ca.crt --client-cert /secrets/garage.crt --client-key /secrets/garage.key" -``` - -Now we can check that everything is going well by checking our cluster status: - -``` -grg status -``` - -Don't forget that `help` command and `--help` subcommands can help you anywhere, the CLI tool is self-documented! Two examples: - -``` -grg help -grg bucket allow --help -``` - -Fine, now let's create a bucket (we imagine that you want to deploy nextcloud): - -``` -grg bucket create nextcloud-bucket -``` - -Check that everything went well: - -``` -grg bucket list -grg bucket info nextcloud-bucket -``` - -Now we will generate an API key to access this bucket. -Note that API keys are independent of buckets: one key can access multiple buckets, multiple keys can access one bucket. - -Now, let's start by creating a key only for our PHP application: - -``` -grg key new --name nextcloud-app-key -``` - -You will have the following output (this one is fake, `key_id` and `secret_key` were generated with the openssl CLI tool): - -``` -Key { key_id: "GK3515373e4c851ebaad366558", secret_key: "7d37d093435a41f2aab8f13c19ba067d9776c90215f56614adad6ece597dbb34", name: "nextcloud-app-key", name_timestamp: 1603280506694, deleted: false, authorized_buckets: [] } -``` - -Check that everything works as intended (be careful, info works only with your key identifier and not with its friendly name!): - -``` -grg key list -grg key info GK3515373e4c851ebaad366558 -``` - -Now that we have a bucket and a key, we need to give permissions to the key on the bucket! - -``` -grg bucket allow --read --write nextcloud-bucket --key GK3515373e4c851ebaad366558 -``` - -You can check at any times allowed keys on your bucket with: - -``` -grg bucket info nextcloud-bucket -``` - -Now, let's move to the S3 API! -We will use the `s3cmd` CLI tool. -You can install it via your favorite package manager. -Otherwise, check [their website](https://s3tools.org/s3cmd) - -We will configure `s3cmd` with its interactive configuration tool, be careful not all endpoints are implemented! -Especially, the test run at the end does not work (yet). - -``` -$ s3cmd --configure - -Enter new values or accept defaults in brackets with Enter. -Refer to user manual for detailed description of all options. - -Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables. -Access Key: GK3515373e4c851ebaad366558 -Secret Key: 7d37d093435a41f2aab8f13c19ba067d9776c90215f56614adad6ece597dbb34 -Default Region [US]: garage - -Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3. -S3 Endpoint [s3.amazonaws.com]: garage.deuxfleurs.fr - -Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used -if the target S3 system supports dns based buckets. -DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: garage.deuxfleurs.fr - -Encryption password is used to protect your files from reading -by unauthorized persons while in transfer to S3 -Encryption password: -Path to GPG program [/usr/bin/gpg]: - -When using secure HTTPS protocol all communication with Amazon S3 -servers is protected from 3rd party eavesdropping. This method is -slower than plain HTTP, and can only be proxied with Python 2.7 or newer -Use HTTPS protocol [Yes]: - -On some networks all internet access must go through a HTTP proxy. -Try setting it here if you can't connect to S3 directly -HTTP Proxy server name: - -New settings: - Access Key: GK3515373e4c851ebaad366558 - Secret Key: 7d37d093435a41f2aab8f13c19ba067d9776c90215f56614adad6ece597dbb34 - Default Region: garage - S3 Endpoint: garage.deuxfleurs.fr - DNS-style bucket+hostname:port template for accessing a bucket: garage.deuxfleurs.fr - Encryption password: - Path to GPG program: /usr/bin/gpg - Use HTTPS protocol: True - HTTP Proxy server name: - HTTP Proxy server port: 0 - -Test access with supplied credentials? [Y/n] n - -Save settings? [y/N] y -Configuration saved to '/home/quentin/.s3cfg' -``` - -Now, if everything works, the following commands should work: - -``` -echo hello world > hello.txt -s3cmd put hello.txt s3://nextcloud-bucket -s3cmd ls s3://nextcloud-bucket -s3cmd rm s3://nextcloud-bucket/hello.txt -``` - -That's all for now! - diff --git a/doc/book/.gitignore b/doc/book/.gitignore new file mode 100644 index 00000000..7585238e --- /dev/null +++ b/doc/book/.gitignore @@ -0,0 +1 @@ +book diff --git a/doc/book/book.toml b/doc/book/book.toml new file mode 100644 index 00000000..3e163990 --- /dev/null +++ b/doc/book/book.toml @@ -0,0 +1,6 @@ +[book] +authors = ["Quentin Dufour"] +language = "en" +multilingual = false +src = "src" +title = "Garage Documentation" diff --git a/doc/book/src/SUMMARY.md b/doc/book/src/SUMMARY.md new file mode 100644 index 00000000..7c435f23 --- /dev/null +++ b/doc/book/src/SUMMARY.md @@ -0,0 +1,31 @@ +# Summary + +[The Garage Data Store](./intro.md) + +- [Getting Started](./getting_started/index.md) + - [Get a binary](./getting_started/binary.md) + - [Configure the daemon](./getting_started/daemon.md) + - [Control the daemon](./getting_started/control.md) + - [Configure a cluster](./getting_started/cluster.md) + - [Create buckets and keys](./getting_started/bucket.md) + - [Handle files](./getting_started/files.md) + +- [Cookbook](./cookbook/index.md) + - [Host a website](./cookbook/website.md) + - [Integrate as a media backend]() + - [Operate a cluster]() + +- [Reference Manual](./reference_manual/index.md) + - [Garage CLI]() + - [S3 API](./reference_manual/s3_compatibility.md) + +- [Design](./design/index.md) + - [Related Work](./design/related_work.md) + - [Internals](./design/internals.md) + +- [Development](./development/index.md) + - [Setup your environment](./development/devenv.md) + - [Your first contribution]() + +- [Working Documents](./working_documents/index.md) + - [Load Balancing Data](./working_documents/load_balancing.md) diff --git a/doc/book/src/cookbook/index.md b/doc/book/src/cookbook/index.md new file mode 100644 index 00000000..d7a51065 --- /dev/null +++ b/doc/book/src/cookbook/index.md @@ -0,0 +1,5 @@ +# Cookbook + +A cookbook, when you cook, is a collection of recipes. +Similarly, Garage's cookbook contains a collection of recipes that are known to works well! +This chapter could also be referred as "Tutorials" or "Best practises". diff --git a/doc/book/src/cookbook/website.md b/doc/book/src/cookbook/website.md new file mode 100644 index 00000000..2ea82a9a --- /dev/null +++ b/doc/book/src/cookbook/website.md @@ -0,0 +1 @@ +# Host a website diff --git a/doc/book/src/design/index.md b/doc/book/src/design/index.md new file mode 100644 index 00000000..d09a6008 --- /dev/null +++ b/doc/book/src/design/index.md @@ -0,0 +1,5 @@ +# Design + +The design section helps you to see Garage from a "big picture" perspective. +It will allow you to understand if Garage is a good fit for you, +how to better use it, how to contribute to it, what can Garage could and could not do, etc. diff --git a/doc/Internals.md b/doc/book/src/design/internals.md similarity index 100% rename from doc/Internals.md rename to doc/book/src/design/internals.md diff --git a/doc/Related Work.md b/doc/book/src/design/related_work.md similarity index 78% rename from doc/Related Work.md rename to doc/book/src/design/related_work.md index c1a4eed4..bae4691c 100644 --- a/doc/Related Work.md +++ b/doc/book/src/design/related_work.md @@ -1,3 +1,5 @@ +# Related Work + ## Context Data storage is critical: it can lead to data loss if done badly and/or on hardware failure. @@ -8,7 +10,7 @@ But here we consider non specialized off the shelf machines that can be as low p Distributed storage may help to solve both availability and scalability problems on these machines. Many solutions were proposed, they can be categorized as block storage, file storage and object storage depending on the abstraction they provide. -## Related work +## Overview Block storage is the most low level one, it's like exposing your raw hard drive over the network. It requires very low latencies and stable network, that are often dedicated. @@ -36,3 +38,19 @@ However Pithos is not maintained anymore. More precisely the company that publis Some tests conducted by the [ACIDES project](https://acides.org/) have shown that Openstack Swift consumes way more resources (CPU+RAM) that we can afford. Furthermore, people developing Swift have not designed their software for geo-distribution. There were many attempts in research too. I am only thinking to [LBFS](https://pdos.csail.mit.edu/papers/lbfs:sosp01/lbfs.pdf) that was used as a basis for Seafile. But none of them have been effectively implemented yet. + +## Existing software + +**[Pithos](https://github.com/exoscale/pithos) :** +Pithos has been abandonned and should probably not used yet, in the following we explain why we did not pick their design. +Pithos was relying as a S3 proxy in front of Cassandra (and was working with Scylla DB too). +From its designers' mouth, storing data in Cassandra has shown its limitations justifying the project abandonment. +They built a closed-source version 2 that does not store blobs in the database (only metadata) but did not communicate further on it. +We considered there v2's design but concluded that it does not fit both our *Self-contained & lightweight* and *Simple* properties. It makes the development, the deployment and the operations more complicated while reducing the flexibility. + +**[IPFS](https://ipfs.io/) :** +*Not written yet* + +## Specific research papers + +*Not yet written* diff --git a/doc/book/src/development/devenv.md b/doc/book/src/development/devenv.md new file mode 100644 index 00000000..6cb7c554 --- /dev/null +++ b/doc/book/src/development/devenv.md @@ -0,0 +1,17 @@ +# Setup your development environment + +We propose the following quickstart to setup a full dev. environment as quickly as possible: + + 1. Setup a rust/cargo environment. eg. `dnf install rust cargo` + 2. Install awscli v2 by following the guide [here](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html). + 3. Run `cargo build` to build the project + 4. Run `./script/dev-cluster.sh` to launch a test cluster (feel free to read the script) + 5. Run `./script/dev-configure.sh` to configure your test cluster with default values (same datacenter, 100 tokens) + 6. Run `./script/dev-bucket.sh` to create a bucket named `eprouvette` and an API key that will be stored in `/tmp/garage.s3` + 7. Run `source ./script/dev-env-aws.sh` to configure your CLI environment + 8. You can use `garage` to manage the cluster. Try `garage --help`. + 9. You can use the `awsgrg` alias to add, remove, and delete files. Try `awsgrg help`, `awsgrg cp /proc/cpuinfo s3://eprouvette/cpuinfo.txt`, or `awsgrg ls s3://eprouvette`. `awsgrg` is a wrapper on the `aws s3` command pre-configured with the previously generated API key (the one in `/tmp/garage.s3`) and localhost as the endpoint. + +Now you should be ready to start hacking on garage! + + diff --git a/doc/book/src/development/index.md b/doc/book/src/development/index.md new file mode 100644 index 00000000..d6b5e38b --- /dev/null +++ b/doc/book/src/development/index.md @@ -0,0 +1,4 @@ +# Development + +Now that you are a Garage expert, you want to enhance it, you are in the right place! +We discuss here how to hack on Garage, how we manage its development, etc. diff --git a/doc/book/src/getting_started/binary.md b/doc/book/src/getting_started/binary.md new file mode 100644 index 00000000..9a18babc --- /dev/null +++ b/doc/book/src/getting_started/binary.md @@ -0,0 +1,44 @@ +# Get a binary + +Currently, only two installations procedures are supported for Garage: from Docker (x86\_64 for Linux) and from source. +In the future, we plan to add a third one, by publishing a compiled binary (x86\_64 for Linux). +We did not test other architecture/operating system but, as long as your architecture/operating system is supported by Rust, you should be able to run Garage (feel free to report your tests!). + +## From Docker + +Our docker image is currently named `lxpz/garage_amd64` and is stored on the [Docker Hub](https://hub.docker.com/r/lxpz/garage_amd64/tags?page=1&ordering=last_updated). +We encourage you to use a fixed tag (eg. `v0.1.1d`) and not the `latest` tag. +For this example, we will use the latest published version at the time of the writing which is `v0.1.1d` but it's up to you +to check [the most recent versions on the Docker Hub](https://hub.docker.com/r/lxpz/garage_amd64/tags?page=1&ordering=last_updated). + +For example: + +``` +sudo docker pull lxpz/garage_amd64:v0.1.1d +``` + +## From source + +Garage is a standard Rust project. +First, you need `rust` and `cargo`. +On Debian: + +```bash +sudo apt-get update +sudo apt-get install -y rustc cargo +``` + +Then, you can ask cargo to install the binary for you: + +```bash +cargo install garage +``` + +That's all, `garage` should be in `$HOME/.cargo/bin`. +You can add this folder to your `$PATH` or copy the binary somewhere else on your system. +For the following, we will assume you copied it in `/usr/local/bin/garage`: + +```bash +sudo cp $HOME/.cargo/bin/garage /usr/local/bin/garage +``` + diff --git a/doc/book/src/getting_started/bucket.md b/doc/book/src/getting_started/bucket.md new file mode 100644 index 00000000..b22ce788 --- /dev/null +++ b/doc/book/src/getting_started/bucket.md @@ -0,0 +1,78 @@ +# Create buckets and keys + +*We use a command named `garagectl` which is in fact an alias you must define as explained in the [Control the daemon](./daemon.md) section.* + +In this section, we will suppose that we want to create a bucket named `nextcloud-bucket` +that will be accessed through a key named `nextcloud-app-key`. + +Don't forget that `help` command and `--help` subcommands can help you anywhere, the CLI tool is self-documented! Two examples: + +``` +garagectl help +garagectl bucket allow --help +``` + +## Create a bucket + +Fine, now let's create a bucket (we imagine that you want to deploy nextcloud): + +``` +garagectl bucket create nextcloud-bucket +``` + +Check that everything went well: + +``` +garagectl bucket list +garagectl bucket info nextcloud-bucket +``` + +## Create an API key + +Now we will generate an API key to access this bucket. +Note that API keys are independent of buckets: one key can access multiple buckets, multiple keys can access one bucket. + +Now, let's start by creating a key only for our PHP application: + +``` +garagectl key new --name nextcloud-app-key +``` + +You will have the following output (this one is fake, `key_id` and `secret_key` were generated with the openssl CLI tool): + +```javascript +Key { + key_id: "GK3515373e4c851ebaad366558", + secret_key: "7d37d093435a41f2aab8f13c19ba067d9776c90215f56614adad6ece597dbb34", + name: "nextcloud-app-key", + name_timestamp: 1603280506694, + deleted: false, + authorized_buckets: [] +} +``` + +Check that everything works as intended (be careful, info works only with your key identifier and not with its friendly name!): + +``` +garagectl key list +garagectl key info GK3515373e4c851ebaad366558 +``` + +## Allow a key to access a bucket + +Now that we have a bucket and a key, we need to give permissions to the key on the bucket! + +``` +garagectl bucket allow \ + --read \ + --write + nextcloud-bucket \ + --key GK3515373e4c851ebaad366558 +``` + +You can check at any times allowed keys on your bucket with: + +``` +garagectl bucket info nextcloud-bucket +``` + diff --git a/doc/book/src/getting_started/cluster.md b/doc/book/src/getting_started/cluster.md new file mode 100644 index 00000000..af6e8f10 --- /dev/null +++ b/doc/book/src/getting_started/cluster.md @@ -0,0 +1,72 @@ +# Configure a cluster + +*We use a command named `garagectl` which is in fact an alias you must define as explained in the [Control the daemon](./daemon.md) section.* + +In this section, we will inform garage of the disk space available on each node of the cluster +as well as the site (think datacenter) of each machine. + +## Test cluster + +As this part is not relevant for a test cluster, you can use this one-liner to create a basic topology: + +```bash +garagectl status | grep UNCONFIGURED | grep -Po '^[0-9a-f]+' | while read id; do + garagectl node configure -d dc1 -n 10 $id +done +``` + +## Real-world cluster + +For our example, we will suppose we have the following infrastructure (Tokens, Identifier and Datacenter are specific values to garage described in the following): + +| Location | Name | Disk Space | `Tokens` | `Identifier` | `Datacenter` | +|----------|---------|------------|----------|--------------|--------------| +| Paris | Mercury | 1 To | `100` | `8781c5` | `par1` | +| Paris | Venus | 2 To | `200` | `2a638e` | `par1` | +| London | Earth | 2 To | `200` | `68143d` | `lon1` | +| Brussels | Mars | 1.5 To | `150` | `212f75` | `bru1` | + +### Identifier + +After its first launch, garage generates a random and unique identifier for each nodes, such as: + +``` +8781c50c410a41b363167e9d49cc468b6b9e4449b6577b64f15a249a149bdcbc +``` + +Often a shorter form can be used, containing only the beginning of the identifier, like `8781c5`, +which identifies the server "Mercury" located in "Paris" according to our previous table. + +The most simple way to match an identifier to a node is to run: + +``` +garagectl status +``` + +It will display the IP address associated with each node; from the IP address you will be able to recognize the node. + +### Tokens + +Garage reasons on an arbitrary metric about disk storage that is named "tokens". +The number of tokens must be proportional to the disk space dedicated to the node. +Additionaly, ideally the number of tokens must be in the order of magnitude of 100 +to provide a good trade-off between data load balancing and performances (*this sentence must be verified, it may be wrong*). + +Here we chose 1 token = 10 Go but you are free to select the value that best fit your needs. + +### Datacenter + +Datacenter are simply a user-chosen identifier that identify a group of server that are located in the same place. +It is up to the system administrator deploying garage to identify what does "the same place" means. +Behind the scene, garage will try to store the same data on different sites to provide high availability despite a data center failure. + +### Inject the topology + +Given the information above, we will configure our cluster as follow: + +``` +garagectl node configure --datacenter par1 -n 100 -t mercury 8781c5 +garagectl node configure --datacenter par1 -n 200 -t venus 2a638e +garagectl node configure --datacenter lon1 -n 200 -t earth 68143d +garagectl node configure --datacenter bru1 -n 150 -t mars 212f75 +``` diff --git a/doc/book/src/getting_started/control.md b/doc/book/src/getting_started/control.md new file mode 100644 index 00000000..9a66a0dc --- /dev/null +++ b/doc/book/src/getting_started/control.md @@ -0,0 +1,77 @@ +# Control the daemon + +The `garage` binary has two purposes: + - it acts as a daemon when launched with `garage server ...` + - it acts as a control tool for the daemon when launched with any other command + +In this section, we will see how to use the `garage` binary as a control tool for the daemon we just started. +You first need to get a shell having access to this binary, which depends of your configuration: + - with `docker-compose`, run `sudo docker-compose exec g1 bash` then `/garage/garage` + - with `docker`, run `sudo docker exec -ti garaged bash` then `/garage/garage` + - with `systemd`, simply run `/usr/local/bin/garage` if you followed previous instructions + +*You can also install the binary on your machine to remotely control the cluster.* + +## Talk to the daemon and create an alias + +`garage` requires 4 options to talk with the daemon: + +``` +--ca-cert +--client-cert +--client-key +-h, --rpc-host +``` + +The 3 first ones are certificates and keys needed by TLS, the last one is simply the address of garage's RPC endpoint. +Because we configure garage directly from the server, we do not need to set `--rpc-host`. +To avoid typing the 3 first options each time we want to run a command, we will create an alias. + +### `docker-compose` alias + +```bash +alias garagectl='/garage/garage \ + --ca-cert /pki/garage-ca.crt \ + --client-cert /pki/garage.crt \ + --client-key /pki/garage.key' +``` + +### `docker` alias + +```bash +alias garagectl='/garage/garage \ + --ca-cert /etc/garage/pki/garage-ca.crt \ + --client-cert /etc/garage/pki/garage.crt \ + --client-key /etc/garage/pki/garage.key' +``` + + +### raw binary alias + +```bash +alias garagectl='/usr/local/bin/garage \ + --ca-cert /etc/garage/pki/garage-ca.crt \ + --client-cert /etc/garage/pki/garage.crt \ + --client-key /etc/garage/pki/garage.key' +``` + +Of course, if your deployment does not match exactly one of this alias, feel free to adapt it to your needs! + +## Test the alias + +You can test your alias by running a simple command such as: + +``` +garagectl status +``` + +You should get something like that as result: + +``` +Healthy nodes: +2a638ed6c775b69a… 37f0ba978d27 [::ffff:172.20.0.101]:3901 UNCONFIGURED/REMOVED +68143d720f20c89d… 9795a2f7abb5 [::ffff:172.20.0.103]:3901 UNCONFIGURED/REMOVED +8781c50c410a41b3… 758338dde686 [::ffff:172.20.0.102]:3901 UNCONFIGURED/REMOVED +``` + +...which means that you are ready to configure your cluster! diff --git a/doc/book/src/getting_started/daemon.md b/doc/book/src/getting_started/daemon.md new file mode 100644 index 00000000..2f2b71b0 --- /dev/null +++ b/doc/book/src/getting_started/daemon.md @@ -0,0 +1,222 @@ +# Configure the daemon + +Garage is a software that can be run only in a cluster and requires at least 3 instances. +In our getting started guide, we document two deployment types: + - [Test deployment](#test-deployment) though `docker-compose` + - [Real-world deployment](#real-world-deployment) through `docker` or `systemd` + +In any case, you first need to generate TLS certificates, as traffic is encrypted between Garage's nodes. + +## Generating a TLS Certificate + +To generate your TLS certificates, run on your machine: + +``` +wget https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/branch/master/genkeys.sh +chmod +x genkeys.sh +./genkeys.sh +``` + +It will creates a folder named `pki` containing the keys that you will used for the cluster. + +## Test deployment + +Single machine deployment is only described through `docker-compose`. + +Before starting, we recommend you create a folder for our deployment: + +```bash +mkdir garage-single +cd garage-single +``` + +We start by creating a file named `docker-compose.yml` describing our network and our containers: + +```yml +version: '3.4' + +networks: { virtnet: { ipam: { config: [ subnet: 172.20.0.0/24 ]}}} + +services: + g1: + image: lxpz/garage_amd64:v0.1.1d + networks: { virtnet: { ipv4_address: 172.20.0.101 }} + volumes: + - "./pki:/pki" + - "./config.toml:/garage/config.toml" + + g2: + image: lxpz/garage_amd64:v0.1.1d + networks: { virtnet: { ipv4_address: 172.20.0.102 }} + volumes: + - "./pki:/pki" + - "./config.toml:/garage/config.toml" + + g3: + image: lxpz/garage_amd64:v0.1.1d + networks: { virtnet: { ipv4_address: 172.20.0.103 }} + volumes: + - "./pki:/pki" + - "./config.toml:/garage/config.toml" +``` + +*We define a static network here which is not considered as a best practise on Docker. +The rational is that Garage only supports IP address and not domain names in its configuration, so we need to know the IP address in advance.* + +and then create the `config.toml` file next to it as follow: + +```toml +metadata_dir = "/garage/meta" +data_dir = "/garage/data" +rpc_bind_addr = "[::]:3901" +bootstrap_peers = [ + "172.20.0.101:3901", + "172.20.0.102:3901", + "172.20.0.103:3901", +] + +[rpc_tls] +ca_cert = "/pki/garage-ca.crt" +node_cert = "/pki/garage.crt" +node_key = "/pki/garage.key" + +[s3_api] +s3_region = "garage" +api_bind_addr = "[::]:3900" + +[s3_web] +bind_addr = "[::]:3902" +root_domain = ".web.garage" +index = "index.html" +``` + +*Please note that we have not mounted `/garage/meta` or `/garage/data` on the host: data will be lost when the container will be destroyed.* + +And that's all, you are ready to launch your cluster! + +``` +sudo docker-compose up +``` + +While your daemons are up, your cluster is still not configured yet. +However, you can check that your services are still listening as expected by querying them from your host: + +```bash +curl http://172.20.0.{101,102,103}:3902 +``` + +which should give you: + +``` +Not found +Not found +Not found +``` + +That's all, you are ready to [configure your cluster!](./cluster.md). + +## Real-world deployment + +Before deploying garage on your infrastructure, you must inventory your machines. +For our example, we will suppose the following infrastructure: + +| Location | Name | IP Address | Disk Space | +|----------|---------|------------|------------| +| Paris | Mercury | fc00:1::1 | 1 To | +| Paris | Venus | fc00:1::2 | 2 To | +| London | Earth | fc00:1::2 | 2 To | +| Brussels | Mars | fc00:B::1 | 1.5 To | + +On each machine, we will have a similar setup, especially you must consider the following folders/files: + - `/etc/garage/pki`: Garage certificates, must be generated on your computer and copied on the servers + - `/etc/garage/config.toml`: Garage daemon's configuration (defined below) + - `/etc/systemd/system/garage.service`: Service file to start garage at boot automatically (defined below, not required if you use docker) + - `/var/lib/garage/meta`: Contains Garage's metadata, put this folder on a SSD if possible + - `/var/lib/garage/data`: Contains Garage's data, this folder will grows and must be on a large storage, possibly big HDDs. + +A valid `/etc/garage/config.toml` for our cluster would be: + +```toml +metadata_dir = "/var/lib/garage/meta" +data_dir = "/var/lib/garage/data" +rpc_bind_addr = "[::]:3901" +bootstrap_peers = [ + "[fc00:1::1]:3901", + "[fc00:1::2]:3901", + "[fc00:B::1]:3901", + "[fc00:F::1]:3901", +] + +[rpc_tls] +ca_cert = "/etc/garage/pki/garage-ca.crt" +node_cert = "/etc/garage/pki/garage.crt" +node_key = "/etc/garage/pki/garage.key" + +[s3_api] +s3_region = "garage" +api_bind_addr = "[::]:3900" + +[s3_web] +bind_addr = "[::]:3902" +root_domain = ".web.garage" +index = "index.html" +``` + +Please make sure to change `bootstrap_peers` to **your** IP addresses! + +### For docker users + +On each machine, you can run the daemon with: + +```bash +docker run \ + -d \ + --name garaged \ + --restart always \ + --network host \ + -v /etc/garage/pki:/etc/garage/pki \ + -v /etc/garage/config.toml:/garage/config.toml \ + -v /var/lib/garage/meta:/var/lib/garage/meta \ + -v /var/lib/garage/data:/var/lib/garage/data \ + lxpz/garage_amd64:v0.1.1d +``` + +It should be restart automatically at each reboot. +Please note that we use host networking as otherwise Docker containers can no communicate with IPv6. + +To upgrade, simply stop and remove this container and start again the command with a new version of garage. + +### For systemd/raw binary users + +Create a file named `/etc/systemd/system/garage.service`: + +```toml +[Unit] +Description=Garage Data Store +After=network-online.target +Wants=network-online.target + +[Service] +Environment='RUST_LOG=garage=info' 'RUST_BACKTRACE=1' +ExecStart=/usr/local/bin/garage server -c /etc/garage/config.toml + +[Install] +WantedBy=multi-user.target +``` + +To start the service then automatically enable it at boot: + +```bash +sudo systemctl start garage +sudo systemctl enable garage +``` + +To see if the service is running and to browse its logs: + +```bash +sudo systemctl status garage +sudo journalctl -u garage +``` + +If you want to modify the service file, do not forget to run `systemctl daemon-reload` +to inform `systemd` of your modifications. diff --git a/doc/book/src/getting_started/files.md b/doc/book/src/getting_started/files.md new file mode 100644 index 00000000..0e3939ce --- /dev/null +++ b/doc/book/src/getting_started/files.md @@ -0,0 +1,42 @@ +# Handle files + +We recommend the use of MinIO Client to interact with Garage files (`mc`). +Instructions to install it and use it are provided on the [MinIO website](https://docs.min.io/docs/minio-client-quickstart-guide.html). +Before reading the following, you need a working `mc` command on your path. + +## Configure `mc` + +You need your access key and secret key created in the [previous section](bucket.md). +You also need to set the endpoint: it must match the IP address of one of the node of the cluster and the API port (3900 by default). +For this whole configuration, you must set an alias name: we chose `my-garage`, that you will used for all commands. + +Adapt the following command accordingly and run it: + +```bash +mc alias set \ + my-garage \ + http://172.20.0.101:3900 \ + \ + \ + --api S3v4 +``` + +You must also add an environment variable to your configuration to inform MinIO of our region (`garage` by default). +The best way is to add the following snippet to your `$HOME/.bash_profile` or `$HOME/.bashrc` file: + +```bash +export MC_REGION=garage +``` + +## Use `mc` + +You can not list buckets from `mc` currently. + +But the following commands and many more should work: + +```bash +mc cp image.png my-garage/nextcloud-bucket +mc cp my-garage/nextcloud-bucket/image.png . +mc ls my-garage/nextcloud-bucket +mc mirror localdir/ my-garage/another-bucket +``` diff --git a/doc/book/src/getting_started/index.md b/doc/book/src/getting_started/index.md new file mode 100644 index 00000000..282f5034 --- /dev/null +++ b/doc/book/src/getting_started/index.md @@ -0,0 +1,5 @@ +# Getting Started + +Let's start your Garage journey! +In this chapter, we explain how to deploy a simple garage cluster and start interacting with it. +Our goal is to introduce you to Garage's workflows. diff --git a/doc/book/src/img/logo.svg b/doc/book/src/img/logo.svg new file mode 100644 index 00000000..fb02c40b --- /dev/null +++ b/doc/book/src/img/logo.svg @@ -0,0 +1,44 @@ + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/doc/book/src/intro.md b/doc/book/src/intro.md new file mode 100644 index 00000000..02920f83 --- /dev/null +++ b/doc/book/src/intro.md @@ -0,0 +1,95 @@ +![Garage's Logo](img/logo.svg) + +# The Garage Geo-Distributed Data Store + +Garage is a lightweight geo-distributed data store. +It comes from the observation that despite numerous object stores +many people have broken data management policies (backup/replication on a single site or none at all). +To promote better data management policies, with focused on the following desirable properties: + + - **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target hyperconverged infrastructures + - **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures + - **Simple**: simple to understand, simple to operate, simple to debug + - **Internet enabled**: made for multi-sites (eg. datacenter, offices, etc.) interconnected through a regular internet connection. + +We also noted that the pursuit of some other goals are detrimental to our initial goals. +The following have been identified has non-goals, if it matters to you, you should not use Garage: + + - **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only. + - **Feature extensiveness**: complete implementation of the S3 API or any other API to make garage a drop-in replacement is not targeted as it could lead to decisions impacting our desirable properties. + - **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication. + - **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such syncronizations are translated in network messages that impose severe constraints on the deployment. + +## Supported and planned protocols + +Garage speaks (or will speak) the following protocols: + + - [S3](https://docs.aws.amazon.com/AmazonS3/latest/API/Welcome.html) - *SUPPORTED* - Enable applications to store large blobs such as pictures, video, images, documents, etc. S3 is versatile enough to also be used to publish a static website. + - [IMAP](https://github.com/go-pluto/pluto) - *PLANNED* - email storage is quite complex to get good oerformances. +To keep performances optimals, most imap servers only support on-disk storage. +We plan to add logic to Garage to make it a viable solution for email storage. + - *More to come* + +## Use Cases + +**[Deuxfleurs](https://deuxfleurs.fr) :** Garage is used by Deuxfleurs which is a non-profit hosting organization. +Especially, it is used to host their main website, this documentation and some of its members's blogs. Additionally, +Garage is used as a [backend for Nextcloud](https://docs.nextcloud.com/server/20/admin_manual/configuration_files/primary_storage.html). Deuxfleurs also plans to use Garage as their [Matrix's media backend](https://github.com/matrix-org/synapse-s3-storage-provider) and has the backend of [OCIS](https://github.com/owncloud/ocis). + +*Are you using Garage? [Open a pull request](https://git.deuxfleurs.fr/Deuxfleurs/garage/) to add your organization here!* + +## Comparison to existing software + +**[Minio](https://min.io/) :** Minio shares our *self-contained & lightweight* goal but selected two of our non-goals: *storage optimizations* through erasure coding and *POSIX/Filesystem compatibility* through strong consistency. +However, by pursuing these two non-goals, minio do not reach our desirable properties. +First, it fails on the *simple* property: due to the erasure coding, minio has severe limitations on how drives can be added or deleted from a cluster. +Second, it fails on the *interned enabled* property: due to its strong consistency, minio is latency sensitive. +Furthermore, minio has no knowledge of "sites" and thus can not distribute data to minimize the failure of a given site. + +**[Openstack Swift](https://docs.openstack.org/swift/latest/) :** +OpenStack Swift at least fails on the *self-contained & lightweight* goal. +Starting it requires around 8Gb of RAM, which is too much especially in an hyperconverged infrastructure. +It seems also to be far from *Simple*. + +**[Ceph](https://ceph.io/ceph-storage/object-storage/) :** +This review holds for the whole Ceph stack, including the RADOS paper, Ceph Object Storage module, the RADOS Gateway, etc. +At is core, Ceph has been designed to provide *POSIX/Filesystem compatibility* which requires strong consistency, which in turn +makes Ceph latency sensitive and fails our *Internet enabled* goal. +Due to its industry oriented design, Ceph is also far from being *Simple* to operate and from being *self-contained & lightweight* which makes it hard to integrate it in an hyperconverged infrastructure. +In a certain way, Ceph and Minio are closer togethers than they are from Garage or OpenStack Swift. + +*More comparisons are available in our [Related Work](design/related_work.md) chapter.* + +## Other Resources + +This website is not the only source of information about Garage! +We reference here other places on the Internet where you can learn more about Garage. + +### Rust API (docs.rs) + +If you encounter a specific bug in Garage or plan to patch it, you may jump directly to the source code documentation! + + - [garage\_api](https://docs.rs/garage_api/latest/garage_api/) - contains the S3 standard API endpoint + - [garage\_model](https://docs.rs/garage_model/latest/garage_model/) - contains Garage's model built on the table abstraction + - [garage\_rpc](https://docs.rs/garage_rpc/latest/garage_rpc/) - contains Garage's federation protocol + - [garage\_table](https://docs.rs/garage_table/latest/garage_table/) - contains core Garage's CRDT datatypes + - [garage\_util](https://docs.rs/garage_util/latest/garage_util/) - contains garage entrypoints (daemon, cli) + - [garage\_web](https://docs.rs/garage_web/latest/garage_web/) - contains the S3 website endpoint + +### Talks + +We love to talk and hear about Garage, that's why we keep a log here: + + - [(fr, 2020-12-02) Garage : jouer dans la cour des grands quand on est un hébergeur associatif](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/master/doc/20201202_talk/talk.pdf) + +*Did you write or talk about Garage? [Open a pull request](https://git.deuxfleurs.fr/Deuxfleurs/garage/) to add a link here!* + +## Community + +If you want to discuss with us, you can join our Matrix channel at [#garage:deuxfleurs.fr](https://matrix.to/#/#garage:deuxfleurs.fr). +Our code and our issue tracker, which is the place where you should report bugs, are managed on [Deuxfleurs' Gitea](https://git.deuxfleurs.fr/Deuxfleurs/garage). + +## License + +Garage, all the source code, is released under the [AGPL v3 License](https://www.gnu.org/licenses/agpl-3.0.en.html). +Please note that if you patch Garage and then use it to provide any service over a network, you must share your code! diff --git a/doc/book/src/reference_manual/index.md b/doc/book/src/reference_manual/index.md new file mode 100644 index 00000000..0d4bd6f3 --- /dev/null +++ b/doc/book/src/reference_manual/index.md @@ -0,0 +1,5 @@ +# Reference Manual + +A reference manual contains some extensive descriptions about the features and the behaviour of the software. +Reading of this chapter is recommended once you have a good knowledge/understanding of Garage. +It will be useful if you want to tune it or to use it in some exotic conditions. diff --git a/doc/Compatibility.md b/doc/book/src/reference_manual/s3_compatibility.md similarity index 100% rename from doc/Compatibility.md rename to doc/book/src/reference_manual/s3_compatibility.md diff --git a/doc/book/src/working_documents/index.md b/doc/book/src/working_documents/index.md new file mode 100644 index 00000000..a9e7f899 --- /dev/null +++ b/doc/book/src/working_documents/index.md @@ -0,0 +1,8 @@ +# Working Documents + +Working documents are documents that reflect the fact that Garage is a software that evolves quickly. +They are a way to communicate our ideas, our changes, and so on before or while we are implementing them in Garage. +If you like to live on the edge, it could also serve as a documentation of our next features to be released. + +Ideally, once the feature/patch has been merged, the working document should serve as a source to +update the rest of the documentation and then be removed. diff --git a/doc/Load_Balancing.md b/doc/book/src/working_documents/load_balancing.md similarity index 99% rename from doc/Load_Balancing.md rename to doc/book/src/working_documents/load_balancing.md index a348ebc4..583b6086 100644 --- a/doc/Load_Balancing.md +++ b/doc/book/src/working_documents/load_balancing.md @@ -1,3 +1,5 @@ +## Load Balancing Data (planned for version 0.2) + I have conducted a quick study of different methods to load-balance data over different Garage nodes using consistent hashing. ### Requirements