ci/woodpecker/push/debug Pipeline was successfulDetails
ci/woodpecker/pr/debug Pipeline was successfulDetails
This page of the AWS docs indicate that Content-Type should be part of
the CanonicalHeaders (and therefore SignedHeaders) strings in signature
calculation:
https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
However, testing with Minio Client revealed that it did not sign the
Content-Type header, and therefore we broke CI by expecting it to be
signed. With this commit, we don't mandate Content-Type to be signed
anymore, for better compatibility with the ecosystem. Testing against
the official behavior of S3 on AWS has not been done.
ci/woodpecker/pr/debug Pipeline was successfulDetails
For some users, this might be their first time being interacting with
the `env_logger` crate.
As such, they might not be aware that less verbose log levels exist.
Some might not want to log every incoming request, for example.
This commit also adds syntax hints to the code-fence for bash for better
syntax highlighting of that section, and repeats itself multiple times,
that `info` is, in fact, the default.
No changes to the recommendation of log levels were made.
continuous-integration/drone/pr Build is passingDetails
Having all Engine enum variants conditional causes compilation errors
when *none* of the DB engine features is enabled. This is not an issue
for full garage build, but affects crates that use garage_db as
dependency.
Change all variants to be present at all times. It solves compilation
errors and also allows us to better differentiate between invalid DB
engine name and engine with support not compiled in current binary.
continuous-integration/drone/pr Build is passingDetails
Use optional DB open overrides for both input and output database.
Duplicating the same override flag for input/output would result in too
many, too long flags. It would be too costly for very rare edge-case
where converting between same DB engine, just with different flags.
Because overrides flags for different engines are disjoint and we are
preventing conversion between same input/ouput DB engine, we can have
only one set.
The override flag will be passed either to input or output, based on
engine type it belongs to. It will never be passed to both of them and
cause unwelcome surprise to user.
continuous-integration/drone/push Build is passingDetails
continuous-integration/drone/pr Build is passingDetails
This minor release includes the following improvements and fixes:
New features:
- Configuration: make LMDB's `map_size` configurable and make `block_size` and `sled_cache_capacity` expressable as strings (such as `10M`) (#628, #630)
- Add support for binding to Unix sockets for the S3, K2V, Admin and Web API servers (#640)
- Move the `convert_db` command into the main Garage binary (#645)
- Add support for specifying RPC secret and admin tokens as environment variables (#643)
- Add `allow_world_readable_secrets` option to config file (#663, #685)
Bug fixes:
- Use `statvfs` instead of mount list to determine free space in metadata/data directories (#611, #631)
- Add missing casts to fix 32-bit build (#632)
- Fix error when none of the HTTP servers (S3/K2V/Admin/Web) is started and fix shutdown hang (#613, #633)
- Add missing CORS headers to PostObject response (#609, #656)
- Monitoring: finer histogram boundaries in Prometheus exported metrics (#531, #686)
Other:
- Documentation improvements (#641)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEwhSWp0+ubv79TiqUDkltFQljdr4FAmWmWvsACgkQDkltFQlj
dr59rRAAiMGQpDUK0QqiCgrp1rcUhvtj3DsQEpT7F14Jo3I7bFDmONZolPbO8YAs
VE4S4CBQogNH0lMQ6EvJYiBCxDWkxdVibKqDWOYJmUw3bZ6Ypn1eZIF0+Uf1TDI+
C6CxYbyDQtqvm330K2Du2uOoGiIgm83b6jktK/0FtbAE2GWhtYmQwoelprAGH20i
baaSfkZbBl8toUscakyhPVVSQ86BcVQ2jqL6Ofu4eQknjMRqCeAIQhMB2ikpiwBz
hbTZ3x0EfJJqiHocfkTE3B3cPnDKuHDzxPRhLMB/olEpzoxaLJ2+tc0ziQdl06/F
1c8nHM57L1IaDGKAkpcANnj3yVf3jfPqq9SEUNi+xSIWbvln91RvXU4RIB8hiZqa
rqAHjDuys++3DoAUr/L4X233MWufVAEYT4B+jaPAv6ys35xhQwPAMJrA0OZEr+hE
HQMPIG9uMDVjZ2QCgFYgC02kEqvxbsRSVnb0wjI7eoNOk0LKo154eJh1cOGd4Ibs
yBTiIi1+Y7RCXNxcIHKlj5vMUHPBr2D8DVFj21kfZKUtMQ/8yScoiRC14ZR4J2xF
IYe3aDm80l3tYgnPRVj4fOGiIPsqnZd4iazYKwj2cifB8tzYfyh5/9fv2aio8K5y
0GAw4AoTtgg1hLMadbc3om7wy64IRaZzXjv59eYPEotZYdreVpM=
=RVm8
-----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEwhSWp0+ubv79TiqUDkltFQljdr4FAmWmZKIACgkQDkltFQlj
dr6wZRAA1XuOBax/7YsIix3ag0kjnwnGAx8wYaA+Jiojw2yv/+ePL6yGHcKA93lI
SL8l5G06fTDgpbpfdVbgyRzGT2tmjrXvkygRWf2WMDZ9I+8WxUA2q8aWaEMiNmvd
0cfzYi14TgX+O0wEbKeeqrXG0473/yThk5U1FNbdJd7rkJ4JzaOTthdk0LJLiEUG
zQ/YIYx3FVFoVI0rdORb3HKzqYHjMAvpzNhEIeqkrpDEzplQ3jKvY+rYWQL3S9zE
bHbkZPoT62OpJGMr04/1FUkB+ctsvUrM0CskruaSKWyD2M1xTo/Ug4jh5muVIcdJ
hJis1/k5rV8JDTIkb6eAxKqfVzI+56yDxofT8rVF4JhvlzvXDLOa0uyDVyA8/6un
ylWRzs2Mlj6/TbscmPjrdH8v2Lb0zjWxvXe2iYnHHfldWUlYuBtI6FZiG3uNjBCs
7ns3xr4VOw13RM5auVkEQksIO6lru0kvH18GB3h6Msx67w2JUzl+PaNv8PdRtnmV
0SfLUl1Nh8yT2h9qG6/3cDE9E1G/mjg8SgljoEe6ahs/BUZmLuTHTyBjf+P22ZbO
DCITM3CwrV+y/aKnRdLvd6LOWFinUqMS8YvVSVqJh9vo9R+dt33LdBMdWjP4IYHF
MbACe4FzeG3AXUcHB/mDCm7a2H2BFwzAovFy0SE639PfWBxNue0=
=gzWq
-----END PGP SIGNATURE-----
Merge tag 'v0.8.5' into sync-08-09
Garage v0.8.5
This minor release includes the following improvements and fixes:
New features:
- Configuration: make LMDB's `map_size` configurable and make `block_size` and `sled_cache_capacity` expressable as strings (such as `10M`) (#628, #630)
- Add support for binding to Unix sockets for the S3, K2V, Admin and Web API servers (#640)
- Move the `convert_db` command into the main Garage binary (#645)
- Add support for specifying RPC secret and admin tokens as environment variables (#643)
- Add `allow_world_readable_secrets` option to config file (#663, #685)
Bug fixes:
- Use `statvfs` instead of mount list to determine free space in metadata/data directories (#611, #631)
- Add missing casts to fix 32-bit build (#632)
- Fix error when none of the HTTP servers (S3/K2V/Admin/Web) is started and fix shutdown hang (#613, #633)
- Add missing CORS headers to PostObject response (#609, #656)
- Monitoring: finer histogram boundaries in Prometheus exported metrics (#531, #686)
Other:
- Documentation improvements (#641)
continuous-integration/drone/pr Build is passingDetails
Sometimes, the secret files permissions checks gets in the way. It's
by no mean complete, it doesn't take the Posix ACLs into account among
other things. Correctly checking the ACLs would be too involving (see
#658 (comment))
and would likely still fail in some weird chmod settings.
We're adding a new configuration file key allowing the user to disable
this permission check altogether.
The (already existing) env variable counterpart always take precedence
to this config file option. That's useful in cases where the
configuration file is static and cannot be easily altered.
Fixes #658
Co-authored-by: Florian Klink <flokli@flokli.de>
continuous-integration/drone/pr Build is passingDetails
This is still a bit confusing, as normally the flake.defaultNix attrset
gets exposed via a top-level default.nix, but at least it brings us
closer to that.
continuous-integration/drone/pr Build is passingDetails
Compiling garage_db v0.8.2 (garage-0.8.2/src/db)
error: cannot find macro `warn` in this scope
--> src/db/lmdb_adapter.rs:352:2
|
352 | warn!("LMDB is not recommended on 32-bit systems, database size will be limited");
| ^^^^
|
= help: consider importing this macro:
tracing::warn
= note: `warn` is in scope, but it is an attribute: `#[warn]`
error: could not compile `garage_db` due to previous error
continuous-integration/drone/pr Build is passingDetails
* Renamed my_garage_service -> garage-s3-service.
* Defined a web service for port 3902.
* Added a garage-s3 router.
* Pointed website definition at web service.
* Use the /health endpoint for loadBalancer health check.
* Renamed gzip_compress to just compression as traefik v3 will also do
brotli compression.
DaemonSet is a k8s resource that schedules one instance per node,
which is useful for some garage deployment use cases, including
managing garage nodes using k8s node labels
continuous-integration/drone/push Build is passingDetails
As discussed in the chat yesterday, I want to propose to disable the ingress per default.
The motivation behind this change is, that per default the ingress is "misconfigured"
meaning it can not work with the default values and requires a user of the chart to
add additional configuration. When installing the chart per default, I would not
expect to already expose garage publicly without my explicit configuration to do so
Commenting the ingressClass resource also allows for relying only on
annotations - otherwise the ingressClass would be always set to nginx
or require a user to override it with ingressClass: null
A small change on top, I've added the ability to specify user defined labels per ingress
The default values forces people to create an ingress resources,
where per default an ingress is not necessary to start garage.
If someone wants to utilize an ingress, he would need to define
the values for the ingress either way, so enabling the ingress
explicitly makes more sense, then requiring it to be disabled per default
continuous-integration/drone/pr Build is passingDetails
By default, Nginx does proxy buffering and it may store big replies to a
temporary file up to 1 GB. It also means that Nginx will read data as
fast as possible from Garage, even if the client downloads slowly. Both
behaviours are often not wanted, so disable this temporary file in the example.
Ref: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_buffering
Also add an example of upstream with a "backup" server, which may be
useful to only use remote servers as fallback.
continuous-integration/drone/push Build is failingDetails
continuous-integration/drone/pr Build is failingDetails
- self.node_id_vec was not properly updated when the previous ring was empty
- ClusterLayout::merge was not considering changes in the layout parameters
- have consistent error return types
- store the zone redundancy in a Lww
- print the error and message in the CLI (TODO: for the server Api, should msg be returned in the body response?)
It takes as paramters the replication factor and the zone redundancy, computes the
largest partition size reachable with these constraints, and among the possible
assignation with this partition size, it computes the one that moves the least number
of partitions compared to the previous assignation.
This computation uses graph algorithms defined in graph_algo.rs
continuous-integration/drone/push Build is passingDetails
Performance improvements included in this PR:
- [x] Use `Bytes` at a few places where appropriate, instead of `Vec<u8>`, to reduce the number of copies
- [x] StreamChunker now accumulates incoming slices in a `Vec<Bytes>` instead of a `VecDeque<u8>`. Replaces calls to `.extend()` and `.drain()` that were quite costly by a simple `concat()` on a vec of slices which is much more optimized
- [x] Hashing (b2, sha256, md5) is now done on a Tokio thread dedicated to cpu-intensive tasks, using `spawn_blocking`
- [x] Block manager now uses 256 independant locks instead of one big lock for writing, reduces contention when writing several/many objects in parallel
- [x] Better LMDB defaults: we now put flags `NoSync` and `NoMetaSync` to avoid `fsync` at each transaction (extremely slow). Also increased number of LMDB readers to accomodate more intensive workloads
Other changes included in this PR:
- [x] Update to hashing and MAC crates: md5 and sha2 from 0.9 to 0.10, hmac from 0.10 to 0.12
- [x] switch to `tracing_subscriber` for logs, which allows to have timing of each event
Reviewed-on: #342
continuous-integration/drone/push Build was killedDetails
This PR includes work from @jirutka :
- [x] Allow linking against system-provided libraries (libsodium, libsqlite, libzstd) #370
- [x] Make OTLP exporter optional and allow building without Prometheus exporter (/metrics) #372
And also:
- [x] Update `.nix` files
- [x] Remove heed default-features
- [x] Bump versions of all Garage crates to 0.8.0
- [x] Make db engines (lmdb, sled, sqlite) optionnal
- [x] Add documentation for available features
- [x] Directly include code of previous versions used for migration in order to reduce dependencies
- [x] Read variable `GIT_VERSION` from garage main instead of in crate garage_util to make builds faster
- [x] Report features used in the build somewhere? (in `garage --version` or something)
- [x] Check we `warn!` correctly if we try to use deactivated feature
- [x] Allow not to launch S3 endpoint if not in config
Reviewed-on: #373
Garage currently uses the legacy resolver "1". The new one is used
by default if the root package specifies 'edition = 2021', which
Garage does not (yet).
The problem with the legacy resolver is, among others, that features
enabled by dev-dependencies are propagated to normal dependencies.
This affects e.g. hyper - one of the dev-dependencies enables "http2"
feature that adds many extra dependencies. If we build garage without
opentelemetry-otlp (this is enabled in the following commit), there's
no normal dependency enabling "http2" feature.
See https://doc.rust-lang.org/cargo/reference/resolver.html#feature-resolver-version-2
continuous-integration/drone/push Build is passingDetails
Included in this PR:
- [x] Small refactor, resync code is moved to a separate `block/resync.rs` file
- [x] Block resync tranquility is no longer in config file, it is set dynamically using `garage worker set resync-tranquility` (this parameter is persisted over Garage restarts)
- [x] Up to 4 block resync workers can be activated to run simultaneously to speed up big resyncs, this parameter is set dynamically using `garage worker set resync-n-workers`
Reviewed-on: #369
continuous-integration/drone/push Build is pendingDetails
continuous-integration/drone/pr Build is pendingDetails
Unfortunately, rusqlite uses the opposite logic for enabling/disabling
bundled libraries to others (libsodium-sys, zstd-sys). Cargo features
are very limited and doesn't allow to enable feature A in a dependency
iff feature B is disabled.
Note, lmdb-rkv-sys doesn't need any special treatment because it
automatically links against system liblmdb if found via pkgconf.
Linux distros should build garage with
`--no-default-features --features system-libs` to disable bundled-libs
and enable system-libs.
If this feature is enabled, libsodium-sys and zstd-sys will link
dynamically against system-provided libraries instead of building
and linking statically the bundled (possibly outdated and vulnerable)
copies of them. This feature is intended mainly for linux package
maintainers.
continuous-integration/drone/pr Build is passingDetails
continuous-integration/drone/tag Build is passingDetails
continuous-integration/drone Build is passingDetails
continuous-integration/drone/push Build is passingDetails
By default, structopt reports the value provided by
the env var CARGO_PKG_VERSION, feeded by Cargo when reading
Cargo.toml. However for Garage we use a versioning based on git,
so we often report a version that is behind the real version.
In this commit, we create garage_util::version::garage() that
reports the right version and configure all structopt subcommands
to call this function instead of using the env var.
continuous-integration/drone/push Build is passingDetails
- [x] New background worker trait
- [x] Adapt all current workers to use new API
- [x] Command to list currently running workers, and whether they are active, idle, or dead
- [x] Error reporting
- Optimizations
- [x] Merkle updater: several items per iteration
- [ ] Use `tokio::task::spawn_blocking` where appropriate so that CPU-intensive tasks don't block other things going on
- scrub:
- [x] have only one worker with a channel to start/pause/cancel
- [x] automatic scrub
- [x] ability to view and change tranquility from CLI
- [x] persistence of a few info
- [ ] Testing
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #332
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
- [x] Refactoring of internal counting API
- [x] Repair procedure for counters (it's an offline procedure!!!)
- [x] New counter for objects in buckets
- [x] Add quotas to buckets struct
- [x] Add CLI to manage bucket quotas
- [x] Add admin API to manage bucket quotas
- [x] Apply quotas by adding checks on put operations
- [x] Proof-read
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #326
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
- [x] Design interface
- [x] Implement Sled backend
- [x] Re-implement the SledCountedTree hack ~~on Sled backend~~ on all backends (i.e. over the abstraction)
- [x] Convert Garage code to use generic interface
- [x] Proof-read converted Garage code
- [ ] Test everything well
- [x] Implement sqlite backend
- [x] Implement LMDB backend
- [ ] (Implement Persy backend?)
- [ ] (Implement other backends? (like RocksDB, ...))
- [x] Implement backend choice in config file and garage server module
- [x] Add CLI for converting between DB formats
- Exploit the new interface to put more things in transactions
- [x] `.updated()` trigger on Garage tables
Fix#284
**Bugs**
- [x] When exporting sqlite, trees iterate empty??
- [x] LMDB doesn't work
**Known issues for various back-ends**
- Sled:
- Eats all my RAM and also all my disk space
- `.len()` has to traverse the whole table
- Is actually quite slow on some operations
- And is actually pretty bad code...
- Sqlite:
- Requires a lock to be taken on all operations. The lock is also taken when iterating on a table with `.iter()`, and the lock isn't released until the iterator is dropped. This means that we must be VERY carefull to not do anything else inside a `.iter()` loop or else we will have a deadlock! Most such cases have been eliminated from the Garage codebase, but there might still be some that remain. If your Garage-over-Sqlite seems to hang/freeze, this is the reason.
- (adapter uses a bunch of unsafe code)
- Heed (LMDB):
- Not suited for 32-bit machines as it has to map the whole DB in memory.
- (adpater uses a tiny bit of unsafe code)
**My recommendation:** avoid 32-bit machines and use LMDB as much as possible.
**Converting databases** is actually quite easy. For example from Sled to LMDB:
```bash
cd src/db
cargo run --features cli --bin convert -- -i path/to/garage/meta/db -a sled -o path/to/garage/meta/db.lmdb -b lmdb
```
Then, just add this to your `config.toml`:
```toml
db_engine = "lmdb"
```
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #322
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
Mention PostObject is implemented, fix english mistakes
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #314
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
continuous-integration/drone/tag Build is passingDetails
continuous-integration/drone Build is passingDetails
- [x] Better distinguish error types
- [x] Parse error messages received from server
- [x] Remove `src/` folder layer, we don't have that for other crates
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #307
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
**Spec:**
- [x] Start writing
- [x] Specify all layout endpoints
- [x] Specify all endpoints for operations on keys
- [x] Specify all endpoints for operations on key/bucket permissions
- [x] Specify all endpoints for operations on buckets
- [x] Specify all endpoints for operations on bucket aliases
View rendered spec at <https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/admin-api/doc/drafts/admin-api.md>
**Code:**
- [x] Refactor code for admin api to use common api code that was created for K2V
**General endpoints:**
- [x] Metrics
- [x] GetClusterStatus
- [x] ConnectClusterNodes
- [x] GetClusterLayout
- [x] UpdateClusterLayout
- [x] ApplyClusterLayout
- [x] RevertClusterLayout
**Key-related endpoints:**
- [x] ListKeys
- [x] CreateKey
- [x] ImportKey
- [x] GetKeyInfo
- [x] UpdateKey
- [x] DeleteKey
**Bucket-related endpoints:**
- [x] ListBuckets
- [x] CreateBucket
- [x] GetBucketInfo
- [x] DeleteBucket
- [x] PutBucketWebsite
- [x] DeleteBucketWebsite
**Operations on key/bucket permissions:**
- [x] BucketAllowKey
- [x] BucketDenyKey
**Operations on bucket aliases:**
- [x] GlobalAliasBucket
- [x] GlobalUnaliasBucket
- [x] LocalAliasBucket
- [x] LocalUnaliasBucket
**And also:**
- [x] Separate error type for the admin API (this PR includes a quite big refactoring of error handling)
- [x] Add management of website access
- [ ] Check that nothing is missing wrt what can be done using the CLI
- [ ] Improve formatting of the spec
- [x] Make sure everyone is cool with the API design
Fix#231Fix#295
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #298
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
lib.rs could use getting split in modules, but I'm not sure how exactly
Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr>
Reviewed-on: #303
Co-authored-by: trinity-1686a <trinity.pointard@gmail.com>
Co-committed-by: trinity-1686a <trinity.pointard@gmail.com>
continuous-integration/drone/push Build is passingDetails
**Specification:**
View spec at [this URL](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md)
- [x] Specify the structure of K2V triples
- [x] Specify the DVVS format used for causality detection
- [x] Specify the K2V index (just a counter of number of values per partition key)
- [x] Specify single-item endpoints: ReadItem, InsertItem, DeleteItem
- [x] Specify index endpoint: ReadIndex
- [x] Specify multi-item endpoints: InsertBatch, ReadBatch, DeleteBatch
- [x] Move to JSON objects instead of tuples
- [x] Specify endpoints for polling for updates on single values (PollItem)
**Implementation:**
- [x] Table for K2V items, causal contexts
- [x] Indexing mechanism and table for K2V index
- [x] Make API handlers a bit more generic
- [x] K2V API endpoint
- [x] K2V API router
- [x] ReadItem
- [x] InsertItem
- [x] DeleteItem
- [x] PollItem
- [x] ReadIndex
- [x] InsertBatch
- [x] ReadBatch
- [x] DeleteBatch
**Testing:**
- [x] Just a simple Python script that does some requests to check visually that things are going right (does not contain parsing of results or assertions on returned values)
- [x] Actual tests:
- [x] Adapt testing framework
- [x] Simple test with InsertItem + ReadItem
- [x] Test with several Insert/Read/DeleteItem + ReadIndex
- [x] Test all combinations of return formats for ReadItem
- [x] Test with ReadBatch, InsertBatch, DeleteBatch
- [x] Test with PollItem
- [x] Test error codes
- [ ] Fix most broken stuff
- [x] test PollItem broken randomly
- [x] when invalid causality tokens are given, errors should be 4xx not 5xx
**Improvements:**
- [x] Descending range queries
- [x] Specify
- [x] Implement
- [x] Add test
- [x] Batch updates to index counter
- [x] Put K2V behind `k2v` feature flag
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #293
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
fixes#295, partially
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #297
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is failingDetails
The function now computes an optimal assignation (with respect to partition size) that minimizes the distance to the former assignation, using flow algorithms.
This commit was written by Mendes Oulamara <mendes.oulamara@pm.me>
continuous-integration/drone/push Build is passingDetails
This removes the >1mb s3_copy restriction.
This restriction doesn't seem to be documented anywhere (I could be wrong). It also causes some software to fail (such as #248).
Co-authored-by: Rob Landers <landers.robert@gmail.com>
Reviewed-on: #280
Co-authored-by: withinboredom <landers.robert@gmail.com>
Co-committed-by: withinboredom <landers.robert@gmail.com>
This change helps ensure that nodes for each partition are spread
over all datacenters, a property that wasn't ensured previously
when going from a 2 DC deployment to a 3 DC deployment
- Global dependencies updated in Cargo.lock
- New module created in src/admin to host:
- the (future) admin REST API
- the metric collection
- add configuration block
No metrics implemented yet
continuous-integration/drone/pr Build is passingDetails
continuous-integration/drone/push Build is passingDetails
This commit adds support to discover garage instances running in
kubernetes.
Once enabled by setting `kubernetes_namespace` and
`kubernetes_service_name` garage will create a Custom Resources
`garagenodes.deuxfleurs.fr` with nodes public key as the resource name.
and IP and Port information as spec in the namespace configured by
`kubernetes_namespace`.
For discovering nodes the resources are filtered with the optionally set
`kubernetes_service_name` which sets a label
`garage.deuxfleurs.fr/service` on the resources.
This allows to separate multiple garage deployments in a single
namespace.
the `kubernetes_skip_crd` variable allows to disable the creation of the
CRD by garage itself. The user must deploy this manually.
continuous-integration/drone/push Build is passingDetails
Nodes would stabilize on different encoding formats for the values,
some having the pre-migration format and some having the post-migration
format. This would be reflected in the Merkle trees never converging
and thus having an infinite resync loop.
continuous-integration/drone/push Build is passingDetails
Fixes#234, among other things
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #237
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
This PR should be merged after the new website is deployed.
- [x] Rename files
- [x] Add front matter section to all `.md` files in the book (necessary for Zola)
- [x] Change all internal links to use Zola's linking system that checks broken links
- [x] Some updates to documentation contents and organization
Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #213
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
continuous-integration/drone/push Build is passingDetails
Implement ListMultipartUploads, also refactor ListObjects and ListObjectsV2.
It took me some times as I wanted to propose the following things:
- Using an iterator instead of the loop+goto pattern. I find it easier to read and it should enable some optimizations. For example, when consuming keys of a common prefix, we do many [redundant checks](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/api/s3_list.rs#L125-L156) while the only thing to do is to [check if the following key is still part of the common prefix](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/feature/s3-multipart-compat/src/api/s3_list.rs#L476).
- Try to name things (see ExtractionResult and RangeBegin enums) and to separate concerns (see ListQuery and Accumulator)
- An IO closure to make unit tests possibles.
- Unit tests, to track regressions and document how to interact with the code
- Integration tests with `s3api`. In the future, I would like to move them in Rust with the aws rust SDK.
Merging of the logic of ListMultipartUploads and ListObjects was not a goal but a consequence of the previous modifications.
Some points that we might want to discuss:
- ListObjectsV1, when using pagination and delimiters, has a weird behavior (it lists multiple times the same prefix) with `aws s3api` due to the fact that it can not use our optimization to skip the whole prefix. It is independant from my refactor and can be tested with the commented `s3api` tests in `test-smoke.sh`. It probably has the same weird behavior on the official AWS S3 implementation.
- Considering ListMultipartUploads, I had to "abuse" upload id marker to support prefix skipping. I send an `upload-id-marker` with the hardcoded value `include` to emulate your "including" token.
- Some ways to test ListMultipartUploads with existing software (my tests are limited to s3api for now).
Co-authored-by: Quentin Dufour <quentin@deuxfleurs.fr>
Reviewed-on: #171
Co-authored-by: Quentin <quentin@dufour.io>
Co-committed-by: Quentin <quentin@dufour.io>
- Fix bucket delete
- fix merge of bucket creation date
- Replace deletable with option in aliases
Rationale: if two aliases point to conflicting bucket, resolving
by making an arbitrary choice risks making data accessible when it
shouldn't be. We'd rather resolve to deleting the alias until
someone puts it back.
- ensure bucket names are correct aws s3 names
- when making aliases, ensure timestamps of links in both ways are the
same
- fix small remarks by trinity
- don't have a separate website_access field
continuous-integration/drone/push Build is passingDetails
fix#77
this does not store anything but a on/off switch for website, and does not implement GetBucketWebsite as it would require storing more. GetBucketWebsite should be pretty easy to implement once data is stored though.
Co-authored-by: Trinity Pointard <trinity.pointard@gmail.com>
Reviewed-on: #174
Co-authored-by: trinity-1686a <trinity.pointard@gmail.com>
Co-committed-by: trinity-1686a <trinity.pointard@gmail.com>
continuous-integration/drone/push Build is passingDetails
fix#161
Current request router was organically grown, and is getting messier and messier with each addition.
This router cover exaustively existing API endpoints (with exceptions listed in [#161(comment)](#161 (comment)) either because new and old api endpoint can't feasabily be differentied, or it's more lambda than s3).
Co-authored-by: Trinity Pointard <trinity.pointard@gmail.com>
Reviewed-on: #163
Reviewed-by: Alex <alex@adnab.me>
Co-authored-by: trinity-1686a <trinity.pointard@gmail.com>
Co-committed-by: trinity-1686a <trinity.pointard@gmail.com>
continuous-integration/drone/pr Build is passingDetails
continuous-integration/drone/tag Build is passingDetails
continuous-integration/drone/push Build is passingDetails
continuous-integration/drone Build is passingDetails
- change the terminology: the network configuration becomes the role
table, the configuration of a nodes becomes a node's role
- the modification of the role table takes place in two steps: first,
changes are staged in a CRDT data structure. Then, once the user is
happy with the changes, they can commit them all at once (or revert
them).
- update documentation
- fix tests
- implement smarter partition assignation algorithm
This patch breaks the format of the network configuration: when
migrating, the cluster will be in a state where no roles are assigned.
All roles must be re-assigned and commited at once. This migration
should not pose an issue.
Garage has many API that you can rely on to build complex applications.
In this section, we reference the existing SDKs and give some code examples.
## ⚠️ DISCLAIMER
**K2V AND ADMIN SDK ARE TECHNICAL PREVIEWS**. The following limitations apply:
- The API is not complete, some actions are possible only through the `garage` binary
- The underlying admin API is not yet stable nor complete, it can breaks at any time
- The generator configuration is currently tweaked, the library might break at any time due to a generator change
- Because the API and the library are not stable, none of them are published in a package manager (npm, pypi, etc.)
- This code has not been extensively tested, some things might not work (please report!)
To have the best experience possible, please consider:
- Make sure that the version of the library you are using is pinned (`go.sum`, `package-lock.json`, `requirements.txt`).
- Before upgrading your Garage cluster, make sure that you can find a version of this SDK that works with your targeted version and that you are able to update your own code to work with this new version of the library.
- Join our Matrix channel at `#garage:deuxfleurs.fr`, say that you are interested by this SDK, and report any friction.
- If stability is critical, mirror this repository on your own infrastructure, regenerate the SDKs and upgrade them at your own pace.
## About the APIs
Code can interact with Garage through 3 different APIs: S3, K2V, and Admin.
Each of them has a specific scope.
### S3
De-facto standard, introduced by Amazon, designed to store blobs of data.
### K2V
A simple database API similar to RiakKV or DynamoDB.
Think a key value store with some additional operations.
Its design is inspired by Distributed Hash Tables (DHT).
More information:
- [In the reference manual](@/documentation/reference-manual/k2v.md)
### Administration
Garage operations can also be automated through a REST API.
We are currently building this SDK for [Python](@/documentation/build/python.md#admin-api), [Javascript](@/documentation/build/javascript.md#administration) and [Golang](@/documentation/build/golang.md#administration).
More information:
- [In the reference manual](@/documentation/reference-manual/admin-api.md)
If you are developping a new application, you may want to use Garage to store your user's media.
The S3 API that Garage uses is a standard REST API, so as long as you can make HTTP requests,
you can query it. You can check the [S3 REST API Reference](https://docs.aws.amazon.com/AmazonS3/latest/API/API_Operations_Amazon_Simple_Storage_Service.html) from Amazon to learn more.
Developping your own wrapper around the REST API is time consuming and complicated.
Instead, there are some libraries already avalaible.
Some of them are maintained by Amazon, some by Minio, others by the community.
**From the GUI.** Activate the "External storage support" app from the "Applications" page (click on your account icon on the top right corner of your screen to display the menu). Go to your parameters page (also located below your account icon). Click on external storage (or the corresponding translation in your language).
[![Screenshot of the External Storage form](cli-nextcloud-gui.png)](cli-nextcloud-gui.png)
*Click on the picture to zoom*
Add a new external storage. Put what you want in "folder name" (eg. "shared"). Select "Amazon S3". Keep "Access Key" for the Authentication field.
In Configuration, put your bucket name (eg. nextcloud), the host (eg. 127.0.0.1), the port (eg. 3900 or 443), the region (garage). Tick the SSL box if you have put an HTTPS proxy in front of garage. You must tick the "Path access" box and you must leave the "Legacy authentication (v2)" box empty. Put your Key ID (eg. GK...) and your Secret Key in the last two input boxes. Finally click on the tick symbol on the right of your screen.
Now go to your "Files" app and a new "linked folder" has appeared with the name you chose earlier (eg. "shared").
We also need to expose these buckets publicly to serve their content to users:
```bash
garage bucket website --allow peertube-playlists
garage bucket website --allow peertube-videos
```
Finally, we must allow Cross-Origin Resource Sharing (CORS).
CORS are required by your browser to allow requests triggered from the peertube website (eg. peertube.tld) to your bucket's domain (eg. peertube-videos.web.garage.tld)
These buckets are now accessible on the web port (by default 3902) with the following URL: `http://<bucket><root_domain>:<web_port>` where the root domain is defined in your configuration file (by default `.web.garage`). So we have currently the following URLs:
* http://peertube-playlists.web.garage:3902
* http://peertube-videos.web.garage:3902
Make sure you (will) have a corresponding DNS entry for them.
### Configure Peertube
You must edit the file named `config/production.yaml`, we are only modifying the root key named `object_storage`:
```yaml
object_storage:
enabled: true
# Put localhost only if you have a garage instance running on that node
endpoint: 'http://localhost:3900' # or "garage.example.com" if you have TLS on port 443
# Garage supports only one region for now, named garage
region: 'garage'
credentials:
access_key_id: 'GKxxxx'
secret_access_key: 'xxxx'
max_upload_part: 2GB
proxy:
# You may enable this feature, yet it will not provide any security benefit, so
# you should rather benefit from Garage public endpoint for all videos
proxify_private_files: false
streaming_playlists:
bucket_name: 'peertube-playlist'
# Keep it empty for our example
prefix: ''
# You must fill this field to make Peertube use our reverse proxy/website logic
# What name gets exposed to users (HTTPS is implicit)
S3_ALIAS_HOST=my-social-media.mydomain.tld
```
For more details, see the [reference Mastodon documentation](https://docs.joinmastodon.org/admin/config/#cdn).
Restart all Mastodon services and everything should now be using Garage!
You can check the URLs of images in the Mastodon web client, they should start
with `https://my-social-media.mydomain.tld`.
### Last migration sync
After Mastodon is successfully using Garage, you can run a last sync from the local filesystem to Garage:
```bash
mc mirror --newer-than "3h" ./public/system/ garage/mastodon-data
```
### References
[cybrespace's guide to migrate to S3](https://github.com/cybrespace/cybrespace-meta/blob/master/s3.md)
(the guide is for Amazon S3, so the configuration is a bit different, but the rest is similar)
## Matrix
Matrix is a chat communication protocol. Its main stable server implementation, [Synapse](https://matrix-org.github.io/synapse/latest/), provides a module to store media on a S3 backend. Additionally, a server independent media store supporting S3 has been developped by the community, it has been made possible thanks to how the matrix API has been designed and will work with implementations like Conduit, Dendrite, etc.
### synapse-s3-storage-provider (synapse only)
Supposing you have a working synapse installation, you can add the module with pip:
*External links:* [Duplicity > man](https://duplicity.gitlab.io/duplicity-web/vers8/duplicity.1.html) (scroll to "URL Format" and "A note on Amazon S3")
Server: Custom server url (s3.garage.localhost:3900)
Bucket name: bucket-name
Bucket create region: Custom region value (garage) # Or as you've specified in garage.toml
AWS Access ID: Key ID from "garage key info key-name"
AWS Access Key: Secret key from "garage key info key-name"
Client Library to use: Minio SDK
```
Click `Test connection` and then no when asked `The bucket name should start with your username, prepend automatically?`. Then it should say `Connection worked!`.
And finally, I recommend appending a small wrapper to your `~/.bashrc` to avoid setting the username on each command (do not forget to replace `GK...` by your access key):
```bash
function duck { command duck --username GK... $@ ; }
*You can find instructions on how to use the GUI in french [in our wiki](https://guide.deuxfleurs.fr/prise_en_main/winscp/).*
How to use `winscp.com`, the CLI interface of WinSCP:
```
open s3://GKxxxxx:yyyyyyy@127.0.0.1:4443 -certificate=* -rawsettings S3DefaultRegion=garage S3UrlStyle=1
ls
ls my-files/
get my-files/an-object.txt Z:\tmp\object.txt
put Z:\tmp\object.txt my-files/another-object.txt
rm my-files/an-object
exit
```
Notes:
- It seems WinSCP supports only TLS connections for S3
- `-certificate=*` allows self-signed certificates, remove it if you have valid certificates
## sftpgo {#sftpgo}
sftpgo needs a database to work, by default it uses sqlite and does not require additional configuration.
You can then directly init it:
```
sftpgo initprovider
```
Then you can directly launch the daemon that will listen by default on `:8080 (http)` and `:2022 (ssh)`:
```
sftpgo serve
```
Go to the admin web interface (http://[::1]:8080/web/admin/), create the required admin account, then create a user account.
Choose a username (eg: `ada`) and a password.
In the filesystem section, choose:
- Storage: AWS S3 (Compatible)
- Bucket: *your bucket name*
- Region: `garage` (or the one you defined in `config.toml`)
- Access key: *your access key*
- Access secret: *your secret key*
- Endpoint: *your endpoint*, eg. `https://garage.example.tld`, note that the protocol (`https` here) must be specified. Non standard ports and `http` have not been tested yet.
- Keep the default values for other fields
- Tick "Use path-style addressing". It should work without ticking it if you have correctly configured your instance to use URL vhost-style.
Now you can access your bucket through SFTP:
```
sftp -P2022 ada@[::1]
ls
```
And through the web interface at http://[::1]:8080/web/client
You can use Garage with Gitea to store your [git LFS](https://git-lfs.github.com/) data, your users' avatar, and their attachements.
You can configure a different target for each data type (check `[lfs]` and `[attachment]` sections of the Gitea documentation) and you can provide a default one through the `[storage]` section.
Let's start by creating a key and a bucket (your key id and secret will be needed later, keep them somewhere):
*We started a plain text registry but docker clients require encrypted registries. You must either [setup TLS](https://docs.docker.com/registry/deploying/#run-an-externally-accessible-registry) on your registry or add `--insecure-registry=localhost:5000` to your docker daemon parameters.*
A cookbook, when you cook, is a collection of recipes.
Similarly, Garage's cookbook contains a collection of recipes that are known to work well!
This chapter could also be referred as "Tutorials" or "Best practices".
- **[Multi-node deployment](@/documentation/cookbook/real-world.md):** This page will walk you through all of the necessary
steps to deploy Garage in a real-world setting.
- **[Building from source](@/documentation/cookbook/from-source.md):** This page explains how to build Garage from
source in case a binary is not provided for your architecture, or if you want to
hack with us!
- **[Binary packages](@/documentation/cookbook/binary-packages.md):** This page
lists the different platforms that provide ready-built software packages for
Garage.
- **[Integration with Systemd](@/documentation/cookbook/systemd.md):** This page explains how to run Garage
as a Systemd service (instead of as a Docker container).
- **[Configuring a gateway node](@/documentation/cookbook/gateways.md):** This page explains how to run a gateway node in a Garage cluster, i.e. a Garage node that doesn't store data but accelerates access to data present on the other nodes.
- **[Hosting a website](@/documentation/cookbook/exposing-websites.md):** This page explains how to use Garage
to host a static website.
- **[Configuring a reverse-proxy](@/documentation/cookbook/reverse-proxy.md):** This page explains how to configure a reverse-proxy to add TLS support to your S3 api endpoint.
- **[Deploying on Kubernetes](@/documentation/cookbook/kubernetes.md):** This page explains how to deploy Garage on Kubernetes using our Helm chart.
- **[Deploying with Ansible](@/documentation/cookbook/ansible.md):** This page lists available Ansible roles developed by the community to deploy Garage.
- **[Monitoring Garage](@/documentation/cookbook/monitoring.md)** This page
explains the Prometheus metrics available for monitoring the Garage
There are three methods to expose buckets as website:
1. using the PutBucketWebsite S3 API call, which is allowed for access keys that have the owner permission bit set
2. from the Garage CLI, by an adminstrator of the cluster
3. using the Garage administration API
The `PutBucketWebsite` API endpoint [is documented](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html) in the official AWS docs.
This endpoint can also be called [using `aws s3api`](https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-website.html) on the command line.
The website configuration supported by Garage is only a subset of the possibilities on Amazon S3: redirections are not supported, only the index document and error document can be specified.
If you want to expose your bucket as a website from the CLI, use this simple command:
```bash
garage bucket website --allow my-website
```
Now it will be **publicly** exposed on the web endpoint (by default listening on port 3902).
## How exposed websites work
Our website serving logic is as follow:
- Supports only static websites (no support for PHP or other languages)
- Does not support directory listing
- The index file is defined per-bucket and can be specified in the `PutBucketWebsite` call
or on the CLI using the `--index-document` parameter (default: `index.html`)
- A custom error document for 404 errors can be specified in the `PutBucketWebsite` call
or on the CLI using the `--error-document` parameter
Now we need to infer the URL of your website through your bucket name.
Let assume:
- we set `root_domain = ".web.example.com"` in `garage.toml` ([ref](@/documentation/reference-manual/configuration.md#web_root_domain))
- our bucket name is `garagehq.deuxfleurs.fr`.
Our bucket will be served if the Host field matches one of these 2 values (the port is ignored):
- `garagehq.deuxfleurs.fr.web.example.com`: you can dedicate a subdomain to your users (here `web.example.com`).
- `garagehq.deuxfleurs.fr`: your users can bring their own domain name, they just need to point them to your Garage cluster.
You can try this logic locally, without configuring any DNS, thanks to `curl`:
```bash
# prepare your test
echo hello world > /tmp/index.html
mc cp /tmp/index.html garage/garagehq.deuxfleurs.fr
# This will build ONLY feature1, feature2 and feature3
cargo build --release --no-default-features \
--features feature1,feature2,feature3
```
The following feature flags are available in v0.8.0:
| Feature flag | Enabled | Description |
| ------------ | ------- | ----------- |
| `bundled-libs` | *by default* | Use bundled version of sqlite3, zstd, lmdb and libsodium |
| `system-libs` | optional | Use system version of sqlite3, zstd, lmdb and libsodium<br>if available (exclusive with `bundled-libs`, build using<br>`cargo build --no-default-features --features system-libs`) |
| `k2v` | optional | Enable the experimental K2V API (if used, all nodes on your<br>Garage cluster must have it enabled as well) |
| `kubernetes-discovery` | optional | Enable automatic registration and discovery<br>of cluster nodes through the Kubernetes API |
| `metrics` | *by default* | Enable collection of metrics in Prometheus format on the admin API |
| `telemetry-otlp` | optional | Enable collection of execution traces using OpenTelemetry |
| `syslog` | optional | Enable logging to Syslog |
| `sled` | *by default* | Enable using Sled to store Garage's metadata |
| `lmdb` | *by default* | Enable using LMDB to store Garage's metadata |
| `sqlite` | *by default* | Enable using Sqlite3 to store Garage's metadata |
Gateways allow you to expose Garage endpoints (S3 API and websites) without storing data on the node.
## Benefits
You can configure Garage as a gateway on all nodes that will consume your S3 API, it will provide you the following benefits:
- **It removes 1 or 2 network RTT.** Instead of (querying your reverse proxy then) querying a random node of the cluster that will forward your request to the nodes effectively storing the data, your local gateway will directly knows which node to query.
- **It eases server management.** Instead of tracking in your reverse proxy and DNS what are the current Garage nodes, your gateway being part of the cluster keeps this information for you. In your software, you will always specify `http://localhost:3900`.
- **It simplifies security.** Instead of having to maintain and renew a TLS certificate, you leverage the Secret Handshake protocol we use for our cluster. The S3 API protocol will be in plain text but limited to your local machine.
## Spawn a Gateway
The instructions are similar to a regular node, the only option that is different is while configuring the node, you must set the `--gateway` parameter:
After deploying, cluster layout must be configured manually as described in [Creating a cluster layout](@/documentation/quick-start/_index.md#creating-a-cluster-layout). Use the following command to access garage CLI:
```bash
kubectl exec --stdin --tty -n garage garage-0 -- ./garage status
```
## Overriding default values
All possible configuration values can be found with:
```bash
helm show values ./garage
```
This is an example `values.overrride.yaml` for deploying in a microk8s cluster with a https s3 api ingress route:
Garage exposes some internal metrics in the Prometheus data format.
This page explains how to exploit these metrics.
## Setting up monitoring
### Enabling the Admin API endpoint
If you have not already enabled the [administration API endpoint](@/documentation/reference-manual/admin-api.md), do so by adding the following lines to your configuration file:
```toml
[admin]
api_bind_addr = "0.0.0.0:3903"
```
This will allow anyone to scrape Prometheus metrics by fetching
`http://localhost:3903/metrics`. If you want to restrict access
to the exported metrics, set the `metrics_token` configuration value
to a bearer token to be used when fetching the metrics endpoint.
### Setting up Prometheus and Grafana
Add a scrape config to your Prometheus daemon to scrape metrics from
all of your nodes:
```yaml
scrape_configs:
- job_name: 'garage'
static_configs:
- targets:
- 'node1.mycluster:3903'
- 'node2.mycluster:3903'
- 'node3.mycluster:3903'
```
If you have set a metrics token in your Garage configuration file,
add the following lines in your Prometheus scrape config:
```yaml
authorization:
type: Bearer
credentials: 'your metrics token'
```
To visualize the scraped data in Grafana,
you can either import our [Grafana dashboard for Garage](https://git.deuxfleurs.fr/Deuxfleurs/garage/raw/branch/main/script/telemetry/grafana-garage-dashboard-prometheus.json)
or make your own.
The list of exported metrics is available on our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section.
To run Garage in cluster mode, we recommend having at least 3 nodes.
This will allow you to setup Garage for three-way replication of your data,
the safest and most available mode proposed by Garage.
We recommend first following the [quick start guide](@/documentation/quick-start/_index.md) in order
to get familiar with Garage's command line and usage patterns.
## Preparing your environment
### Prerequisites
To run a real-world deployment, make sure the following conditions are met:
- You have at least three machines with sufficient storage space available.
- Each machine has an IP address which makes it directly reachable by all other machines.
In many cases, nodes will be behind a NAT and will not each have a public
IPv4 addresses. In this case, is recommended that you use IPv6 for this
end-to-end connectivity if it is available. Otherwise, using a mesh VPN such as
[Nebula](https://github.com/slackhq/nebula) or
[Yggdrasil](https://yggdrasil-network.github.io/) are approaches to consider
in addition to building out your own VPN tunneling.
- This guide will assume you are using Docker containers to deploy Garage on each node.
Garage can also be run independently, for instance as a [Systemd service](@/documentation/cookbook/systemd.md).
You can also use an orchestrator such as Nomad or Kubernetes to automatically manage
Docker containers on a fleet of nodes.
Before deploying Garage on your infrastructure, you must inventory your machines.
For our example, we will suppose the following infrastructure with IPv6 connectivity:
| Location | Name | IP Address | Disk Space |
|----------|---------|------------|------------|
| Paris | Mercury | fc00:1::1 | 1 TB |
| Paris | Venus | fc00:1::2 | 2 TB |
| London | Earth | fc00:B::1 | 2 TB |
| Brussels | Mars | fc00:F::1 | 1.5 TB |
Note that Garage will **always** store the three copies of your data on nodes at different
locations. This means that in the case of this small example, the usable capacity
of the cluster is in fact only 1.5 TB, because nodes in Brussels can't store more than that.
This also means that nodes in Paris and London will be under-utilized.
To make better use of the available hardware, you should ensure that the capacity
available in the different locations of your cluster is roughly the same.
For instance, here, the Mercury node could be moved to Brussels; this would allow the cluster
to store 2 TB of data in total.
### Best practices
- If you have reasonably fast networking between all your nodes, and are planing to store
mostly large files, bump the `block_size` configuration parameter to 10 MB
(`block_size = "10M"`).
- Garage stores its files in two locations: it uses a metadata directory to store frequently-accessed
small metadata items, and a data directory to store data blocks of uploaded objects.
Ideally, the metadata directory would be stored on an SSD (smaller but faster),
and the data directory would be stored on an HDD (larger but slower).
- For the data directory, Garage already does checksumming and integrity verification,
so there is no need to use a filesystem such as BTRFS or ZFS that does it.
We recommend using XFS for the data partition, as it has the best performance.
EXT4 is not recommended as it has more strict limitations on the number of inodes,
which might cause issues with Garage when large numbers of objects are stored.
- Servers with multiple HDDs are supported natively by Garage without resorting
to RAID, see [our dedicated documentation page](@/documentation/operations/multi-hdd.md).
- For the metadata storage, Garage does not do checksumming and integrity
verification on its own, so it is better to use a robust filesystem such as
BTRFS or ZFS. Users have reported that when using the LMDB database engine
(the default), database files have a tendency of becoming corrupted after an
unclean shutdown (e.g. a power outage), so you should take regular snapshots
to be able to recover from such a situation. This can be done using Garage's
built-in automatic snapshotting (since v0.9.4), or by using filesystem level
snapshots. If you cannot do so, you might want to switch to Sqlite which is
more robust.
- LMDB is the fastest and most tested database engine, but it has the following
weaknesses: 1/ data files are not architecture-independent, you cannot simply
move a Garage metadata directory between nodes running different architectures,
and 2/ LMDB is not suited for 32-bit platforms. Sqlite is a viable alternative
if any of these are of concern.
- If you only have an HDD and no SSD, it's fine to put your metadata alongside
the data on the same drive, but then consider your filesystem choice wisely
(see above). Having lots of RAM for your kernel to cache the metadata will
help a lot with performance.
## Get a Docker image
Our docker image is currently named `dxflrs/garage` and is stored on the [Docker Hub](https://hub.docker.com/r/dxflrs/garage/tags?page=1&ordering=last_updated).
We encourage you to use a fixed tag (eg. `v0.9.3`) and not the `latest` tag.
For this example, we will use the latest published version at the time of the writing which is `v0.9.3` but it's up to you
to check [the most recent versions on the Docker Hub](https://hub.docker.com/r/dxflrs/garage/tags?page=1&ordering=last_updated).
For example:
```
sudo docker pull dxflrs/garage:v0.9.3
```
## Deploying and configuring Garage
On each machine, we will have a similar setup,
especially you must consider the following folders/files:
- `/etc/garage.toml`: Garage daemon's configuration (see below)
To add a new website, add the following declaration to your Traefik configuration file:
```toml
[http.routers]
[http.routers.garage-s3]
rule = "Host(`s3.example.org`)"
service = "garage-s3-service"
entryPoints = ["websecure"]
[http.routers.my_website]
rule = "Host(`yoururl.example.org`)"
service = "garage-web-service"
entryPoints = ["websecure"]
```
Enable HTTPS access to your website with the following configuration section ([documentation](https://doc.traefik.io/traefik/https/overview/)):
```toml
...
entryPoints = ["websecure"]
[http.routers.my_website.tls]
certResolver = "myresolver"
...
```
### Adding compression
Add the following configuration section [to compress response](https://doc.traefik.io/traefik/middlewares/http/compress/) using [gzip](https://developer.mozilla.org/en-US/docs/Glossary/GZip_compression) before sending them to the client:
```toml
[http.routers]
[http.routers.my_website]
...
middlewares = ["compression"]
...
[http.middlewares]
[http.middlewares.compression.compress]
```
### Add caching response
Traefik's caching middleware is only available on [entreprise version](https://doc.traefik.io/traefik-enterprise/middlewares/http-cache/), however the freely-available [Souin plugin](https://github.com/darkweak/souin#tr%C3%A6fik-container) can also do the job. (section to be completed)
But at the same time, the `reverse_proxy` is very flexible.
For a production deployment, you should [read its documentation](https://caddyserver.com/docs/caddyfile/directives/reverse_proxy) as it supports features like DNS discovery of upstreams, load balancing with checks, streaming parameters, etc.
### Caching
Caddy can compiled with a
[cache plugin](https://github.com/caddyserver/cache-handler) which can be used
to provide a hot-cache at the webserver-level for static websites hosted by
Garage.
This can be configured as follows:
```caddy
# Caddy global configuration section
{
# Bare minimum configuration to enable cache.
order cache before rewrite
cache
#cache
# allowed_http_verbs GET
# default_cache_control public
# ttl 8h
#}
}
# Site specific section
https:// {
cache
#cache {
# timeout {
# backend 30s
# }
#}
reverse_proxy ...
}
```
Caching is a complicated subject, and the reader is encouraged to study the
available options provided by the plugin.
### On-demand TLS
Caddy supports a technique called
[on-demand TLS](https://caddyserver.com/docs/automatic-https#on-demand-tls), by
which one can configure the webserver to provision TLS certificates when a
client first connects to it.
In order to prevent an attack vector whereby domains are simply pointed at your
webserver and certificates are requested for them - Caddy can be configured to
ask Garage if a domain is authorized for web hosting, before it then requests
a TLS certificate.
This 'check' endpoint, which is on the admin port (3903 by default), can be
configured in Caddy's global section as follows:
```caddy
{
...
on_demand_tls {
ask http://localhost:3903/check
interval 2m
burst 5
}
...
}
```
The host section can then be configured with (note that this uses the web
We make some assumptions for this systemd deployment.
@ -30,7 +33,20 @@ NoNewPrivileges=true
WantedBy=multi-user.target
```
*A note on hardening: garage will be run as a non privileged user, its user id is dynamically allocated by systemd. It cannot access (read or write) home folders (/home, /root and /run/user), the rest of the filesystem can only be read but not written, only the path seen as /var/lib/garage is writable as seen by the service (mapped to /var/lib/private/garage on your host). Additionnaly, the process can not gain new privileges over time.*
**A note on hardening:** Garage will be run as a non privileged user, its user
id is dynamically allocated by systemd (set with `DynamicUser=true`). It cannot
access (read or write) home folders (`/home`, `/root` and `/run/user`), the
rest of the filesystem can only be read but not written, only the path seen as
`/var/lib/garage` is writable as seen by the service. Additionnaly, the process
can not gain new privileges over time.
For this to work correctly, your `garage.toml` must be set with
`metadata_dir=/var/lib/garage/meta` and `data_dir=/var/lib/garage/data`. This
is mandatory to use the DynamicUser hardening feature of systemd, which
autocreates these directories as virtual mapping. If the directory
`/var/lib/garage` already exists before starting the server for the first time,
the systemd service might not start correctly. Note that in your host
filesystem, Garage data will be held in `/var/lib/private/garage`.
To start the service then automatically enable it at boot:
The design section helps you to see Garage from a "big picture"
perspective. It will allow you to understand if Garage is a good fit for
you, how to better use it, how to contribute to it, what can Garage could
and could not do, etc.
- **[Goals and use cases](@/documentation/design/goals.md):** This page explains why Garage was concieved and what practical use cases it targets.
- **[Related work](@/documentation/design/related-work.md):** This pages presents the theoretical background on which Garage is built, and describes other software storage solutions and why they didn't work for us.
- **[Internals](@/documentation/design/internals.md):** This page enters into more details on how Garage manages data internally.
## Talks
We love to talk and hear about Garage, that's why we keep a log here:
- [(en, 2023-01-18) Presentation of Garage with some details on CRDTs and data partitioning among nodes](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/4cff37397f626ef063dad29e5b5e97ab1206015d/doc/talks/2023-01-18-tocatta/talk.pdf)
- [(fr, 2022-11-19) De l'auto-hébergement à l'entre-hébergement : Garage, pour conserver ses données ensemble](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/4cff37397f626ef063dad29e5b5e97ab1206015d/doc/talks/2022-11-19-Capitole-du-Libre/pr%C3%A9sentation.pdf)
- [(en, 2022-06-23) General presentation of Garage](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/4cff37397f626ef063dad29e5b5e97ab1206015d/doc/talks/2022-06-23-stack/talk.pdf)
- [(fr, 2021-11-13, video) Garage : Mille et une façons de stocker vos données](https://video.tedomum.net/w/moYKcv198dyMrT8hCS5jz9) and [slides (html)](https://rfid.deuxfleurs.fr/presentations/2021-11-13/garage/) - during [RFID#1](https://rfid.deuxfleurs.fr/programme/2021-11-13/) event
- [(en, 2021-04-28) Distributed object storage is centralised](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2021-04-28_spirals-team/talk.pdf)
- [(fr, 2020-12-02) Garage : jouer dans la cour des grands quand on est un hébergeur associatif](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/b1f60579a13d3c5eba7f74b1775c84639ea9b51a/doc/talks/2020-12-02_wide-team/talk.pdf)
With Garage, we wanted to build a software defined storage service that follow the [KISS principle](https://en.wikipedia.org/wiki/KISS_principle),
that is suitable for geo-distributed deployments and more generally that would work well for community hosting (like a Mastodon instance).
In our benchmarks, we aim to quantify how Garage performs on these goals compared to the other available solutions.
## Geo-distribution
The main challenge in a geo-distributed setup is latency between nodes of the cluster.
The more a user request will require intra-cluster requests to complete, the more its latency will increase.
This is especially true for sequential requests: requests that must wait the result of another request to be sent.
We designed Garage without consensus algorithms (eg. Paxos or Raft) to minimize the number of sequential and parallel requests.
This serie of benchmarks quantifies the impact of this design choice.
### On a simple simulated network
We start with a controlled environment, all the instances are running on the same (powerful enough) machine.
To control the network latency, we simulate the network with [mknet](https://git.deuxfleurs.fr/trinity-1686a/mknet) (a tool we developped, based on `tc` and the linux network stack).
To mesure S3 endpoints latency, we use our own tool [s3lat](https://git.deuxfleurs.fr/quentin/s3lat/) to observe only the intra-cluster latency and not some contention on the nodes (CPU, RAM, disk I/O, network bandwidth, etc.).
Compared to other benchmark tools, S3Lat sends only one (small) request at the same time and measures its latency.
We selected 5 standard endpoints that are often in the critical path: ListBuckets, ListObjects, GetObject, PutObject and RemoveObject.
In this first benchmark, we consider 5 instances that are located in a different place each. To simulate the distance, we configure mknet with a RTT between each node of 100 ms +/- 20 ms of jitter. We get the following graph, where the colored bars represent the mean latency while the error bars the minimum and maximum one:
![Comparison of endpoints latency for minio and garage](./endpoint-latency.png)
Compared to garage, minio latency drastically increases on 3 endpoints: GetObject, PutObject, RemoveObject.
We suppose that these requests on minio make transactions over Raft, involving 4 sequential requests: 1) sending the message to the leader, 2) having the leader dispatch it to the other nodes, 3) waiting for the confirmation of followers and finally 4) commiting it. With our current configuration, one Raft transaction will take around 400 ms. GetObject seems to correlate to 1 transaction while PutObject and RemoveObject seems to correlate to 2 or 3. Reviewing minio code would be required to confirm this hypothesis.
Conversely, garage uses an architecture similar to DynamoDB and never require global cluster coordination to answer a request.
Instead, garage can always contact the right node in charge of the requested data, and can answer in as low as one request in the case of GetObject and PutObject. We also observed that Garage latency, while often lower to minio, is more dispersed: garage is still in beta and has not received any performance optimization yet.
As a conclusion, Garage performs well in such setup while minio will be hard to use, especially for interactive use cases.
### On a complex simulated network
This time we consider a more heterogeneous network with 6 servers spread in 3 datacenter, giving us 2 servers per datacenters.
We consider that intra-DC communications are now very cheap with a latency of 0.5ms and without any jitter.
The inter-DC remains costly with the same value as before (100ms +/- 20ms of jitter).
We plot a similar graph as before:
![Comparison of endpoints latency for minio and garage with 6 nodes in 3 DC](./endpoint-latency-dc.png)
This new graph is very similar to the one before, neither minio or garage seems to benefit from this new topology, but they also do not suffer from it.
Considering garage, this is expected: nodes in the same DC are put in the same zone, and then data are spread on different zones for data resiliency and availaibility.
Then, in the default mode, requesting data requires to query at least 2 zones to be sure that we have the most up to date information.
These requests will involve at least one inter-DC communication.
In other words, we prioritize data availability and synchronization over raw performances.
Minio's case is a bit different as by default a minio cluster is not location aware, so we can't explain its performances through location awareness.
*We know that minio has a multi site mode but it is definitely not a first class citizen: data are asynchronously replicated from one minio cluster to another.*
We suppose that, due to the consensus, for many of its requests minio will wait for a response of the majority of the server, also involving inter-DC communications.
As a conclusion, our new topology did not influence garage or minio performances, confirming that in presence of latency, garage is the best fit.
### On a real world deployment
*TODO*
## Performance stability
A storage cluster will encounter different scenario over its life, many of them will not be predictable.
In this context, we argue that, more than peak performances, we should seek predictable and stable performances to ensure data availability.
object storage protocol. It enables applications to store large blobs such
as pictures, video, images, documents, etc., in a redundant multi-node
setting. S3 is versatile enough to also be used to publish a static
website.
Garage is an opinionated object storage solution, we focus on the following **desirable properties**:
- **Internet enabled**: made for multi-sites (eg. datacenters, offices, households, etc.) interconnected through regular Internet connections.
- **Self-contained & lightweight**: works everywhere and integrates well in existing environments to target [hyperconverged infrastructures](https://en.wikipedia.org/wiki/Hyper-converged_infrastructure).
- **Highly resilient**: highly resilient to network failures, network latency, disk failures, sysadmin failures.
- **Simple**: simple to understand, simple to operate, simple to debug.
We also noted that the pursuit of some other goals are detrimental to our initial goals.
The following has been identified as **non-goals** (if these points matter to you, you should not use Garage):
- **Extreme performances**: high performances constrain a lot the design and the infrastructure; we seek performances through minimalism only.
- **Feature extensiveness**: we do not plan to add additional features compared to the ones provided by the S3 API.
- **Storage optimizations**: erasure coding or any other coding technique both increase the difficulty of placing data and synchronizing; we limit ourselves to duplication.
- **POSIX/Filesystem compatibility**: we do not aim at being POSIX compatible or to emulate any kind of filesystem. Indeed, in a distributed environment, such synchronizations are translated in network messages that impose severe constraints on the deployment.
## Use-cases
*Are you also using Garage in your organization? [Open a PR](https://git.deuxfleurs.fr/Deuxfleurs/garage) to add your use case here!*
### Deuxfleurs
[Deuxfleurs](https://deuxfleurs.fr) is an experimental non-profit hosting
organization that develops Garage. Deuxfleurs is focused on building highly
available infrastructure through redundancy in multiple geographical
locations. They use Garage themselves for the following tasks:
- Hosting of [main website](https://deuxfleurs.fr), [this website](https://garagehq.deuxfleurs.fr), as well as the personal website of many of the members of the organization
- As a [Matrix media backend](https://github.com/matrix-org/synapse-s3-storage-provider)
- As a Nix binary cache
- To store personal data and shared documents through [Bagage](https://git.deuxfleurs.fr/Deuxfleurs/bagage), a homegrown WebDav-to-S3 and SFTP-to-S3 proxy
- As a backup target using `rclone` and `restic`
The Deuxfleurs Garage cluster is a multi-site cluster currently composed of
- The Dynamo ring (see [this paper](https://dl.acm.org/doi/abs/10.1145/1323293.1294281) and [that paper](https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eisenbud))
- CRDTs
- CRDTs (see [this paper](https://link.springer.com/chapter/10.1007/978-3-642-24550-3_29))
- Consistency model of Garage tables
See this presentation (in French) for some first information:
@ -21,7 +24,7 @@ Openstack Cinder proxy previous solution to provide an uniform API.
File storage provides a higher abstraction, they are one filesystem among others, which means they don't necessarily have all the exotic features of every filesystem.
Often, they relax some POSIX constraints while many applications will still be compatible without any modification.
As an example, we are able to run MariaDB (very slowly) over GlusterFS...
We can also mention CephFS (read [RADOS](https://ceph.com/wp-content/uploads/2016/08/weil-rados-pdsw07.pdf) whitepaper), Lustre, LizardFS, MooseFS, etc.
We can also mention CephFS (read [RADOS](https://doi.org/10.1145/1374596.1374606) whitepaper [[pdf](https://ceph.com/assets/pdfs/weil-rados-pdsw07.pdf)]), Lustre, LizardFS, MooseFS, etc.
OpenStack Manila proxy previous solutions to provide an uniform API.
Finally object storages provide the highest level abstraction.
@ -41,16 +44,36 @@ There were many attempts in research too. I am only thinking to [LBFS](https://p
**[MinIO](https://min.io/):** MinIO shares our *Self-contained & lightweight* goal but selected two of our non-goals: *Storage optimizations* through erasure coding and *POSIX/Filesystem compatibility* through strong consistency.
However, by pursuing these two non-goals, MinIO do not reach our desirable properties.
Firstly, it fails on the *Simple* property: due to the erasure coding, MinIO has severe limitations on how drives can be added or deleted from a cluster.
Secondly, it fails on the *Internet enabled* property: due to its strong consistency, MinIO is latency sensitive.
Furthermore, MinIO has no knowledge of "sites" and thus can not distribute data to minimize the failure of a given site.
This review holds for the whole Ceph stack, including the RADOS paper, Ceph Object Storage module, the RADOS Gateway, etc.
At its core, Ceph has been designed to provide *POSIX/Filesystem compatibility* which requires strong consistency, which in turn
makes Ceph latency-sensitive and fails our *Internet enabled* goal.
Due to its industry oriented design, Ceph is also far from being *Simple* to operate and from being *Self-contained & lightweight* which makes it hard to integrate it in an hyperconverged infrastructure.
In a certain way, Ceph and MinIO are closer together than they are from Garage or OpenStack Swift.
**[Pithos](https://github.com/exoscale/pithos):**
Pithos has been abandonned and should probably not used yet, in the following we explain why we did not pick their design.
Pithos was relying as a S3 proxy in front of Cassandra (and was working with Scylla DB too).
From its designers' mouth, storing data in Cassandra has shown its limitations justifying the project abandonment.
They built a closed-source version 2 that does not store blobs in the database (only metadata) but did not communicate further on it.
We considered there v2's design but concluded that it does not fit both our *Self-contained & lightweight* and *Simple* properties. It makes the development, the deployment and the operations more complicated while reducing the flexibility.
**[IPFS](https://ipfs.io/):** IPFS has design goals radically different from Garage, we have [a blog post](@/blog/2022-ipfs/index.md) talking about it.
We maintain a `script/` folder that contains some useful script to ease testing on Garage.
@ -31,7 +34,7 @@ You can inspect the detailed configuration, including ports, by inspecting `/tmp
This script also spawns a simple HTTPS reverse proxy through `socat` for the S3 endpoint that listens on port `4443`.
Some libraries might require a TLS endpoint to work, refer to our issue [#64](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/64) for more detailed information on this subject.
This script covers the [Launching the garage server](/quick_start/index.html#launching-the-garage-server) section of our Quick start page.
This script covers the [Launching the garage server](@/documentation/quick-start/_index.md#launching-the-garage-server) section of our Quick start page.
### 2. Make them join the cluster
@ -41,7 +44,7 @@ This script covers the [Launching the garage server](/quick_start/index.html#lau
This script will configure each instance by assigning them a zone (`dc1`) and a weight (`1`).
This script covers the [Configuring your Garage node](/quick_start/index.html#configuring-your-garage-node) section of our Quick start page.
This script covers the [Creating a cluster layout](@/documentation/quick-start/_index.md#creating-a-cluster-layout) section of our Quick start page.
### 3. Create a key and a bucket
@ -52,7 +55,7 @@ This script covers the [Configuring your Garage node](/quick_start/index.html#co
This script will create a bucket named `eprouvette` with a key having read and write rights on this bucket.
The key is stored in a filed named `/tmp/garage.s3` and can be used by the following tools to pre-configure them.
This script covers the [Creating buckets and keys](/quick_start/index.html#creating-buckets-and-keys) section of our Quick start page.
This script covers the [Creating buckets and keys](@/documentation/quick-start/_index.md#creating-buckets-and-keys) section of our Quick start page.
Garage automatically resyncs all entries stored in the metadata tables every hour,
to ensure that all nodes have the most up-to-date version of all the information
they should be holding.
The resync procedure is based on a Merkle tree that allows to efficiently find
differences between nodes.
In some special cases, e.g. before an upgrade, you might want to run a table
resync manually. This can be done using `garage repair tables`.
## Metadata table reference fixes
In some very rare cases where nodes are unavailable, some references between objects
are broken. For instance, if an object is deleted, the underlying versions or data
blocks may still be held by Garage. If you suspect that such corruption has occurred
in your cluster, you can run one of the following repair procedures:
- `garage repair versions`: checks that all versions belong to a non-deleted object, and purges any orphan version
- `garage repair block_refs`: checks that all block references belong to a non-deleted object version, and purges any orphan block reference (this will then allow the blocks to be garbage-collected)
The cluster layout in Garage is a table that assigns to each node a role in
the cluster. The role of a node in Garage can either be a storage node with
a certain capacity, or a gateway node that does not store data and is only
used as an API entry point for faster cluster access.
An introduction to building cluster layouts can be found in the [production deployment](@/documentation/cookbook/real-world.md) page.
In Garage, all of the data that can be stored in a given cluster is divided
into slices which we call *partitions*. Each partition is stored by
one or several nodes in the cluster
(see [`replication_mode`](@/documentation/reference-manual/configuration.md#replication_mode)).
The layout determines the correspondence between these partitions,
which exist on a logical level, and actual storage nodes.
## How cluster layouts work in Garage
A cluster layout is composed of the following components:
- a table of roles assigned to nodes, defined by the user
- an optimal assignation of partitions to nodes, computed by an algorithm that is ran once when calling `garage layout apply` or the ApplyClusterLayout API endpoint
- a version number
Garage nodes will always use the cluster layout with the highest version number.
Garage nodes also maintain and synchronize between them a set of proposed role
changes that haven't yet been applied. These changes will be applied (or
canceled) in the next version of the layout.
All operations on the layout can be realized using the `garage` CLI or using the
[administration API endpoint](@/documentation/reference-manual/admin-api.md).
We give here a description of CLI commands, the admin API semantics are very similar.
The following commands insert modifications to the set of proposed role changes
for the next layout version (but they do not create the new layout immediately):
```bash
garage layout assign [...]
garage layout remove [...]
```
The following command can be used to inspect the layout that is currently set in the cluster
and the changes proposed for the next layout version, if any:
```bash
garage layout show
```
The following commands create a new layout with the specified version number,
that either takes into account the proposed changes or cancels them:
an overview for which Garage versions are currently in use within a cluster.
## Minor upgrades
Minor upgrades do not imply cluster downtime.
Before upgrading, you should still read [the changelog](https://git.deuxfleurs.fr/Deuxfleurs/garage/releases) and ideally test your deployment on a staging cluster before.
When you are ready, start by checking the health of your cluster.
You can force some checks with `garage repair`, we recommend at least running `garage repair --all-nodes --yes tables` which is very quick to run (less than a minute).
You will see that the command correctly terminated in the logs of your daemon, or using `garage worker list` (the repair workers should be in the `Done` state).
Finally, you can simply upgrade nodes one by one.
For each node: stop it, install the new binary, edit the configuration if needed, restart it.
## Major upgrades
Major upgrades can be done with minimal downtime with a bit of preparation, but the simplest way is usually to put the cluster offline for the duration of the migration.
Before upgrading, you must read [the changelog](https://git.deuxfleurs.fr/Deuxfleurs/garage/releases) and you must test your deployment on a staging cluster before.
We write guides for each major upgrade, they are stored under the "Working Documents" section of this documentation.
### Major upgrades with full downtime
From a high level perspective, a major upgrade looks like this:
1. Disable API access (for instance in your reverse proxy, or by commenting the corresponding section in your Garage configuration file and restarting Garage)
2. Check that your cluster is idle
3. Make sure the health of your cluster is good (see `garage repair`)
4. Stop the whole cluster
5. Back up the metadata folder of all your nodes, so that you will be able to restore it if the upgrade fails (data blocks being immutable, they should not be impacted)
6. Install the new binary, update the configuration
7. Start the whole cluster
8. If needed, run the corresponding migration from `garage migrate`
9. Make sure the health of your cluster is good
10. Enable API access (reverse step 1)
11. Monitor your cluster while load comes back, check that all your applications are happy with this new version
### Major upgarades with minimal downtime
There is only one operation that has to be coordinated cluster-wide: the switch of one version of the internal RPC protocol to the next.
This means that an upgrade with very limited downtime can simply be performed from one major version to the next by restarting all nodes
simultaneously in the new version.
The downtime will simply be the time required for all nodes to stop and start again, which should be less than a minute.
If all nodes fail to stop and restart simultaneously, some nodes might be temporarily shut out from the cluster as nodes using different RPC protocol
versions are prevented to talk to one another.
The entire procedure would look something like this:
1. Make sure the health of your cluster is good (see `garage repair`)
2. Take each node offline individually to back up its metadata folder, bring them back online once the backup is done.
You can do all of the nodes in a single zone at once as that won't impact global cluster availability.
Do not try to make a backup of the metadata folder of a running node.
**Since Garage v0.9.4,** you can use the `garage meta snapshot --all` command
to take a simultaneous snapshot of the metadata database files of all your
nodes. This avoids the tedious process of having to take them down one by
one before upgrading. Be careful that if automatic snapshotting is enabled,
Garage only keeps the last two snapshots and deletes older ones, so you might
want to disable automatic snapshotting in your upgraded configuration file
until you have confirmed that the upgrade ran successfully. In addition to
snapshotting the metadata databases of your nodes, you should back-up at
least the `cluster_layout` file of one of your Garage instances (this file
should be the same on all nodes and you can copy it safely while Garage is
running).
3. Prepare your binaries and configuration files for the new Garage version
4. Restart all nodes simultaneously in the new version
5. If any specific migration procedure is required, it is usually in one of the two cases:
- It can be run on online nodes after the new version has started, during regular cluster operation.
- it has to be run offline, in which case you will have to again take all nodes offline one after the other to run the repair
For this last step, please refer to the specific documentation pertaining to the version upgrade you are doing.
# HELP api_admin_request_duration Duration of API calls to the various Admin API endpoints
...
```
### Health `GET /health`
Returns `200 OK` if enough nodes are up to have a quorum (ie. serve requests),
otherwise returns `503 Service Unavailable`.
**Example:**
```
$ curl -i http://localhost:3903/health
HTTP/1.1 200 OK
content-type: text/plain
content-length: 102
date: Tue, 08 Aug 2023 07:22:38 GMT
Garage is fully operational
Consult the full health check API endpoint at /v0/health for more details
```
### On-demand TLS `GET /check`
To prevent abuse for on-demand TLS, Caddy developers have specified an endpoint that can be queried by the reverse proxy
to know if a given domain is allowed to get a certificate. Garage implements these endpoints to tell if a given domain is handled by Garage or is garbage.
Garage responds with the following logic:
- If the domain matches the pattern `<bucket-name>.<s3_api.root_domain>`, returns 200 OK
- If the domain matches the pattern `<bucket-name>.<s3_web.root_domain>` and website is configured for `<bucket>`, returns 200 OK
- If the domain matches the pattern `<bucket-name>` and website is configured for `<bucket>`, returns 200 OK
- Otherwise, returns 404 Not Found, 400 Bad Request or 5xx requests.
*Note 1: because in the path-style URL mode, there is only one domain that is not known by Garage, hence it is not supported by this API endpoint.
You must manually declare the domain in your reverse-proxy. Idem for K2V.*
*Note 2: buckets in a user's namespace are not supported yet by this endpoint. This is a limitation of this endpoint currently.*
**Example:** Suppose a Garage instance is configured with `s3_api.root_domain = .s3.garage.localhost` and `s3_web.root_domain = .web.garage.localhost`.
With a private `media` bucket (name in the global namespace, website is disabled), the endpoint will feature the following behavior:
- [Add option for a backend check to approve use of on-demand TLS](https://github.com/caddyserver/caddy/pull/1939)
- [Serving tens of thousands of domains over HTTPS with Caddy](https://caddy.community/t/serving-tens-of-thousands-of-domains-over-https-with-caddy/11179)
**ListObjects:** Implemented, but there isn't a very good specification of what
`encoding-type=url` covers so there might be some encoding bugs. In our
implementation the url-encoded fields are in the same in ListObjects as they
are in ListObjectsV2.
*Note: Ceph API documentation is incomplete and lacks at least HeadBucket and UploadPartCopy,
but these endpoints are documented in [Red Hat Ceph Storage - Chapter 2. Ceph Object Gateway and the S3 API](https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/developer_guide/ceph-object-gateway-and-the-s3-api)*
**PutBucketWebsite:** Implemented, but only stores the index document suffix and the error document path. Redirects are not supported.
*Note: Ceph radosgw has some support for static websites but it is different from the Amazon one. It also does not implement its configuration endpoints.*
### ACL, Policies endpoints
Amazon has 2 access control mechanisms in S3: ACL (legacy) and policies (new one).
Garage implements none of them, and has its own system instead, built around a per-access-key-per-bucket logic.
See Garage CLI reference manual to learn how to use Garage's permission system.
*Note: Ceph documentation briefly says that Ceph supports
[replication through the S3 API](https://docs.ceph.com/en/latest/radosgw/multisite-sync-policy/#s3-replication-api)
but with some limitations.
Additionaly, replication endpoints are not documented in the S3 compatibility page so I don't know what kind of support we can expect.*
### Locking objects
Amazon defines a concept of [object locking](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html) that can be achieved either through a Retention period or a Legal hold.
**From the GUI.** Activate the "External storage support" app from the "Applications" page (click on your account icon on the top right corner of your screen to display the menu). Go to your parameters page (also located below your account icon). Click on external storage (or the corresponding translation in your language).
[![Screenshot of the External Storage form](./cli-nextcloud-gui.png)](./cli-nextcloud-gui.png)
*Click on the picture to zoom*
Add a new external storage. Put what you want in "folder name" (eg. "shared"). Select "Amazon S3". Keep "Access Key" for the Authentication field.
In Configuration, put your bucket name (eg. nextcloud), the host (eg. 127.0.0.1), the port (eg. 3900 or 443), the region (garage). Tick the SSL box if you have put an HTTPS proxy in front of garage. You must tick the "Path access" box and you must leave the "Legacy authentication (v2)" box empty. Put your Key ID (eg. GK...) and your Secret Key in the last two input boxes. Finally click on the tick symbol on the right of your screen.
Now go to your "Files" app and a new "linked folder" has appeared with the name you chose earlier (eg. "shared").
Finally, we need to expose these buckets publicly to serve their content to users:
```bash
garage bucket website --allow peertube-playlist
garage bucket website --allow peertube-video
```
These buckets are now accessible on the web port (by default 3902) with the following URL: `http://<bucket><root_domain>:<web_port>` where the root domain is defined in your configuration file (by default `.web.garage`). So we have currently the following URLs:
* http://peertube-playlist.web.garage:3902
* http://peertube-video.web.garage:3902
Make sure you (will) have a corresponding DNS entry for them.
### Configure a Reverse Proxy to serve CORS
Now we will configure a reverse proxy in front of Garage.
This is required as we have no other way to serve CORS headers yet.
Check the [Configuring a reverse proxy](/cookbook/reverse_proxy.html) section to know how.
Now make sure that your 2 dns entries are pointing to your reverse proxy.
### Configure Peertube
You must edit the file named `config/production.yaml`, we are only modifying the root key named `object_storage`:
```yaml
object_storage:
enabled: true
# Put localhost only if you have a garage instance running on that node
endpoint: 'http://localhost:3900' # or "garage.example.com" if you have TLS on port 443
# This entry has been added by our patch, must be set to true
force_path_style: true
# Garage supports only one region for now, named garage
region: 'garage'
credentials:
access_key_id: 'GKxxxx'
secret_access_key: 'xxxx'
max_upload_part: 2GB
streaming_playlists:
bucket_name: 'peertube-playlist'
# Keep it empty for our example
prefix: ''
# You must fill this field to make Peertube use our reverse proxy/website logic
# You must fill this field to make Peertube use our reverse proxy/website logic
base_url: 'http://peertube-video.web.garage'
```
### That's all
Everything must be configured now, simply restart Peertube and try to upload a video.
You must see in your browser console that data are fetched directly from our bucket (through the reverse proxy).
### Miscellaneous
*Known bug:* The playback does not start and some 400 Bad Request Errors appear in your browser console and on Garage.
If the description of the error contains HTTP Invalid Range: InvalidRange, the error is due to a buggy ffmpeg version.
You must avoid the 4.4.0 and use either a newer or older version.
*Associated issues:* [#137](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/137), [#138](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/138), [#140](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/140). These issues are non blocking.
Matrix is a chat communication protocol. Its main stable server implementation, [Synapse](https://matrix-org.github.io/synapse/latest/), provides a module to store media on a S3 backend. Additionally, a server independent media store supporting S3 has been developped by the community, it has been made possible thanks to how the matrix API has been designed and will work with implementations like Conduit, Dendrite, etc.
### synapse-s3-storage-provider (synapse only)
Supposing you have a working synapse installation, you can add the module with pip:
Gateways allow you to expose Garage endpoints (S3 API and websites) without storing data on the node.
## Benefits
You can configure Garage as a gateway on all nodes that will consume your S3 API, it will provide you the following benefits:
- **It removes 1 or 2 network RTT** Instead of (querying your reverse proxy then) querying a random node of the cluster that will forward your request to the nodes effectively storing the data, your local gateway will directly knows which node to query.
- **It ease server management** Instead of tracking in your reverse proxy and DNS what are the current Garage nodes, your gateway being part of the cluster keeps this information for you. In your software, you will always specify `http://localhost:3900`.
- **It simplifies security** Instead of having to maintain and renew a TLS certificate, you leverage the Secret Handshake protocol we use for our cluster. The S3 API protocol will be in plain text but limited to your local machine.
## Limitations
Currently it will not work with minio client. Follow issue [#64](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/64) for more information.
## Spawn a Gateway
The instructions are similar to a regular node, the only option that is different is while configuring the node, you must set the `--gateway` parameter:
```bash
garage node configure --gateway --tag gw1 xxxx
```
Then use `http://localhost:3900` when a S3 endpoint is required:
```bash
aws --endpoint-url http://127.0.0.1:3900 s3 ls
```
Some files were not shown because too many files have changed in this diff
Show More