Commit Graph

91 Commits

Author SHA1 Message Date
Alex ac03fa7937
Uniformize tracing::* imports (hopefully fixes 32-bit build)
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone Build is passing Details
continuous-integration/drone/push Build is passing Details
2022-07-15 18:31:19 +02:00
Alex 4f38cadf6e Background task manager (#332)
continuous-integration/drone/push Build is passing Details
- [x] New background worker trait
- [x] Adapt all current workers to use new API
- [x] Command to list currently running workers, and whether they are active, idle, or dead
- [x] Error reporting
- Optimizations
  - [x] Merkle updater: several items per iteration
  - [ ] Use `tokio::task::spawn_blocking` where appropriate so that CPU-intensive tasks don't block other things going on
- scrub:
  - [x] have only one worker with a channel to start/pause/cancel
  - [x] automatic scrub
  - [x] ability to view and change tranquility from CLI
  - [x] persistence of a few info
- [ ] Testing

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #332
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-07-08 13:30:26 +02:00
Alex b44d3fc796 Abstract database behind generic interface and implement alternative drivers (#322)
continuous-integration/drone/push Build is passing Details
- [x] Design interface
- [x] Implement Sled backend
  - [x] Re-implement the SledCountedTree hack ~~on Sled backend~~ on all backends (i.e. over the abstraction)
- [x] Convert Garage code to use generic interface
- [x] Proof-read converted Garage code
- [ ] Test everything well
- [x] Implement sqlite backend
- [x] Implement LMDB backend
- [ ] (Implement Persy backend?)
- [ ] (Implement other backends? (like RocksDB, ...))
- [x] Implement backend choice in config file and garage server module
- [x] Add CLI for converting between DB formats
- Exploit the new interface to put more things in transactions
  - [x] `.updated()` trigger on Garage tables

Fix #284

**Bugs**

- [x] When exporting sqlite, trees iterate empty??
- [x] LMDB doesn't work

**Known issues for various back-ends**

- Sled:
  - Eats all my RAM and also all my disk space
  - `.len()` has to traverse the whole table
  - Is actually quite slow on some operations
  - And is actually pretty bad code...
- Sqlite:
  - Requires a lock to be taken on all operations. The lock is also taken when iterating on a table with `.iter()`, and the lock isn't released until the iterator is dropped. This means that we must be VERY carefull to not do anything else inside a `.iter()` loop or else we will have a deadlock! Most such cases have been eliminated from the Garage codebase, but there might still be some that remain. If your Garage-over-Sqlite seems to hang/freeze, this is the reason.
  - (adapter uses a bunch of unsafe code)
- Heed (LMDB):
  - Not suited for 32-bit machines as it has to map the whole DB in memory.
  - (adpater uses a tiny bit of unsafe code)

**My recommendation:** avoid 32-bit machines and use LMDB as much as possible.

**Converting databases** is actually quite easy. For example from Sled to LMDB:

```bash
cd src/db
cargo run --features cli --bin convert -- -i path/to/garage/meta/db -a sled -o path/to/garage/meta/db.lmdb -b lmdb
```

Then, just add this to your `config.toml`:

```toml
db_engine = "lmdb"
```

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #322
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-06-08 10:01:44 +02:00
Alex 382e74c798 First version of admin API (#298)
continuous-integration/drone/push Build is passing Details
**Spec:**

- [x] Start writing
- [x] Specify all layout endpoints
- [x] Specify all endpoints for operations on keys
- [x] Specify all endpoints for operations on key/bucket permissions
- [x] Specify all endpoints for operations on buckets
- [x] Specify all endpoints for operations on bucket aliases

View rendered spec at <https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/admin-api/doc/drafts/admin-api.md>

**Code:**

- [x] Refactor code for admin api to use common api code that was created for K2V

**General endpoints:**

- [x] Metrics
- [x] GetClusterStatus
- [x] ConnectClusterNodes
- [x] GetClusterLayout
- [x] UpdateClusterLayout
- [x] ApplyClusterLayout
- [x] RevertClusterLayout

**Key-related endpoints:**

- [x] ListKeys
- [x] CreateKey
- [x] ImportKey
- [x] GetKeyInfo
- [x] UpdateKey
- [x] DeleteKey

**Bucket-related endpoints:**

- [x] ListBuckets
- [x] CreateBucket
- [x] GetBucketInfo
- [x] DeleteBucket
- [x] PutBucketWebsite
- [x] DeleteBucketWebsite

**Operations on key/bucket permissions:**

- [x] BucketAllowKey
- [x] BucketDenyKey

**Operations on bucket aliases:**

- [x] GlobalAliasBucket
- [x] GlobalUnaliasBucket
- [x] LocalAliasBucket
- [x] LocalUnaliasBucket

**And also:**

- [x] Separate error type for the admin API (this PR includes a quite big refactoring of error handling)
- [x] Add management of website access
- [ ] Check that nothing is missing wrt what can be done using the CLI
- [ ] Improve formatting of the spec
- [x] Make sure everyone is cool with the API design

Fix #231
Fix #295

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #298
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-24 12:16:39 +02:00
trinity-1686a 64c193e3db Add a K2V client library and CLI (#303)
continuous-integration/drone/push Build is passing Details
lib.rs could use getting split in modules, but I'm not sure how exactly

Co-authored-by: trinity-1686a <trinity@deuxfleurs.fr>
Reviewed-on: #303
Co-authored-by: trinity-1686a <trinity.pointard@gmail.com>
Co-committed-by: trinity-1686a <trinity.pointard@gmail.com>
2022-05-18 22:24:09 +02:00
Alex 7b474855e3
Make background runner terminate correctly 2022-05-17 11:38:31 +02:00
Alex 5768bf3622 First implementation of K2V (#293)
continuous-integration/drone/push Build is passing Details
**Specification:**

View spec at [this URL](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md)

- [x] Specify the structure of K2V triples
- [x] Specify the DVVS format used for causality detection
- [x] Specify the K2V index (just a counter of number of values per partition key)
- [x] Specify single-item endpoints: ReadItem, InsertItem, DeleteItem
- [x] Specify index endpoint: ReadIndex
- [x] Specify multi-item endpoints: InsertBatch, ReadBatch, DeleteBatch
- [x] Move to JSON objects instead of tuples
- [x] Specify endpoints for polling for updates on single values (PollItem)

**Implementation:**

- [x] Table for K2V items, causal contexts
- [x] Indexing mechanism and table for K2V index
- [x] Make API handlers a bit more generic
- [x] K2V API endpoint
- [x] K2V API router
- [x] ReadItem
- [x] InsertItem
- [x] DeleteItem
- [x] PollItem
- [x] ReadIndex
- [x] InsertBatch
- [x] ReadBatch
- [x] DeleteBatch

**Testing:**

- [x] Just a simple Python script that does some requests to check visually that things are going right (does not contain parsing of results or assertions on returned values)
- [x] Actual tests:
  - [x] Adapt testing framework
  - [x] Simple test with InsertItem + ReadItem
  - [x] Test with several Insert/Read/DeleteItem + ReadIndex
  - [x] Test all combinations of return formats for ReadItem
  - [x] Test with ReadBatch, InsertBatch, DeleteBatch
  - [x] Test with PollItem
  - [x] Test error codes
- [ ] Fix most broken stuff
  - [x] test PollItem broken randomly
  - [x] when invalid causality tokens are given, errors should be 4xx not 5xx

**Improvements:**

- [x] Descending range queries
  - [x] Specify
  - [x] Implement
  - [x] Add test
- [x] Batch updates to index counter
- [x] Put K2V behind `k2v` feature flag

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #293
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-10 13:16:57 +02:00
Alex 5d404dcd54
Add missing opentelemetry features 2022-04-08 14:21:04 +02:00
Alex 94f1e48fff Update to netapp 0.4.2 (a tiny fix)
continuous-integration/drone/push Build is passing Details
2022-04-07 11:50:03 +02:00
Alex 9d0ed78887 Add feature flag for Kubernetes discovery 2022-03-24 16:57:43 +01:00
Alex e480aaf338
Make background tranquility a configurable parameter 2022-03-23 10:25:19 +01:00
Alex db46cdef79
Update netapp to v0.4.1
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2022-03-15 17:09:57 +01:00
Alex fe62d01b7e
Implement exponential backoff for resync retries
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/pr Build is passing Details
2022-03-14 11:41:20 +01:00
Alex 9b2b531f4d
Make admin server optional
continuous-integration/drone/push Build was killed Details
continuous-integration/drone/pr Build was killed Details
2022-03-14 10:54:25 +01:00
Alex 2377a92f6b
Add wrapper over sled tree to count items (used for big queues) 2022-03-14 10:54:25 +01:00
Alex 203e8d2c34
Bump version to 0.7 because of incompatible Netapp 2022-03-14 10:54:24 +01:00
Alex dc8d0496cc
Refactoring: rename config files, make modifications less invasive 2022-03-14 10:53:51 +01:00
Alex d9a35359bf
Add metrics to web endpoint 2022-03-14 10:53:50 +01:00
Alex 2a5609b292
Add metrics to API endpoint 2022-03-14 10:53:36 +01:00
Alex 818daa5c78
Refactor how durations are measured 2022-03-14 10:53:35 +01:00
Alex 55d4471599
Remove ... at end of hex IDs 2022-03-14 10:52:31 +01:00
Alex bb04d94fa9
Update to Netapp 0.4 which supports distributed tracing 2022-03-14 10:52:30 +01:00
Alex 8c2fb0c066
Add tracing integration with opentelemetry 2022-03-14 10:52:13 +01:00
mricher e349af13a7
Update dependencies and add admin module with metrics
- Global dependencies updated in Cargo.lock
- New module created in src/admin to host:
  - the (future) admin REST API
  - the metric collection
- add configuration block

No metrics implemented yet
2022-03-14 10:51:12 +01:00
Max Audron 9d44127245
add support for kubernetes service discovery
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
This commit adds support to discover garage instances running in
kubernetes.

Once enabled by setting `kubernetes_namespace` and
`kubernetes_service_name` garage will create a Custom Resources
`garagenodes.deuxfleurs.fr` with nodes public key as the resource name.
and IP and Port information as spec in the namespace configured by
`kubernetes_namespace`.

For discovering nodes the resources are filtered with the optionally set
`kubernetes_service_name` which sets a label
`garage.deuxfleurs.fr/service` on the resources.

This allows to separate multiple garage deployments in a single
namespace.

the `kubernetes_skip_crd` variable allows to disable the creation of the
CRD by garage itself. The user must deploy this manually.
2022-03-12 13:05:52 +01:00
Alex ea7fb901eb
Implement {Put,Get,Delete}BucketCors and CORS in general
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/pr Build is passing Details
- OPTIONS request against API endpoint
- Returning corresponding CORS headers on API calls
- Returning corresponding CORS headers on website GET's
2022-01-24 11:58:00 +01:00
Alex d4dd2e2640
Make use of website config, return error document on error 2022-01-13 14:25:19 +01:00
Alex 1bcd6fabbd
New buckets for 0.6.0: small changes
- Fix bucket delete

- fix merge of bucket creation date

- Replace deletable with option in aliases
    Rationale: if two aliases point to conflicting bucket, resolving
    by making an arbitrary choice risks making data accessible when it
    shouldn't be. We'd rather resolve to deleting the alias until
    someone puts it back.
2022-01-04 12:52:47 +01:00
Alex beeef4758e
Some movement of helper code and refactoring of error handling 2022-01-04 12:52:46 +01:00
Alex d8ab5bdc3e
New buckets for 0.6.0: fix model and migration 2022-01-04 12:47:28 +01:00
Alex c7d5c73244
Add must_use to some CRDT functions 2022-01-04 12:47:28 +01:00
Alex 87121dce9d
New buckets for 0.6.0: documentation and build files 2022-01-04 12:47:06 +01:00
Alex b1cfd16913
New buckets for 0.6.0: small fixes, including:
- ensure bucket names are correct aws s3 names
- when making aliases, ensure timestamps of links in both ways are the
  same
- fix small remarks by trinity
- don't have a separate website_access field
2022-01-04 12:46:41 +01:00
Alex 0bbb6673e7
Model changes 2022-01-04 12:45:52 +01:00
Alex 5b1117e582
New model for buckets 2022-01-04 12:45:46 +01:00
trinity-1686a 1eb972b1ac Add compression using zstd (#173)
continuous-integration/drone/push Build is passing Details
fix #27

Co-authored-by: Trinity Pointard <trinity.pointard@gmail.com>
Reviewed-on: #173
Co-authored-by: trinity-1686a <trinity.pointard@gmail.com>
Co-committed-by: trinity-1686a <trinity.pointard@gmail.com>
2021-12-15 11:26:43 +01:00
Alex c94406f428
Improve how node roles are assigned in Garage
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/tag Build is passing Details
continuous-integration/drone/push Build is passing Details
continuous-integration/drone Build is passing Details
- change the terminology: the network configuration becomes the role
  table, the configuration of a nodes becomes a node's role
- the modification of the role table takes place in two steps: first,
  changes are staged in a CRDT data structure. Then, once the user is
  happy with the changes, they can commit them all at once (or revert
  them).
- update documentation
- fix tests
- implement smarter partition assignation algorithm

This patch breaks the format of the network configuration: when
migrating, the cluster will be in a state where no roles are assigned.
All roles must be re-assigned and commited at once. This migration
should not pose an issue.
2021-11-16 16:05:53 +01:00
Trinity Pointard 9c58ec28d3 add support for vhost-style s3 bucket 2021-11-16 15:41:41 +01:00
Trinity Pointard 9d7535c3f5 allow missing bootstrap_peers in garage.toml
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2021-11-05 16:36:25 +01:00
Alex 2090a6187f
Add tranquilizer mechanism to improve on token bucket mechanism
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2021-11-04 13:26:59 +01:00
Alex 6f13d083ab
Add semaphore to limit RAM used by buffered outgoing requests
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
continuous-integration/drone Build is passing Details
2021-11-03 18:02:57 +01:00
Alex 6b47c294f5
Refactoring on repair commands
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2021-10-27 11:14:55 +02:00
Alex 43e13a501d
Use published netapp crate instead of git repo
continuous-integration/drone/push Build is passing Details
2021-10-26 10:36:57 +02:00
Alex de4276202a
Improve CLI, adapt tests, update documentation 2021-10-25 14:21:48 +02:00
Alex 1b450c4b49
Improvements to CLI and various fixes for netapp version
Discovery via consul, persist peer list to file
2021-10-22 16:55:24 +02:00
Alex 4067797d01
First port of Garage to Netapp 2021-10-22 15:55:18 +02:00
Alex b9127dd6f8
Prepare for v0.3.0 and add migration path from v0.2.1.x
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/pr Build was killed Details
2021-05-28 15:29:58 +02:00
Alex b490ebc7f6
Many improvements on ring/replication and its configuration:
- Explicit "replication_mode" configuration parameters that takes
  either "none", "2" or "3" as values, instead of letting user configure
  replication factor themselves. These are presets whose corresponding
  replication/quorum values can be found in replication/mode.rs

- Explicit support for single-node and two-node deployments
  (number of nodes must be at least "replication_mode", with "none"
  we can have only one node)

- Ring is now stored much more compactly with 256*8 + n*32 bytes,
  instead of 256*32 bytes

- Support for gateway-only nodes that do not store data
  (these nodes still need a metadata_directory to store the list
  of bucket and keys since those are stored on all nodes; it also
  technically needs a data_directory to start but it will stay
  empty unless we have bugs)
2021-05-28 14:07:36 +02:00
Alex 6ccffc3162
Improved XML serialization
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
- Use quick_xml and serde for all XML response returned by the S3 API.
- Include tests for all structs used to generate XML
- Remove old manual XML escaping function which was unsafe
2021-05-06 22:37:15 +02:00
Trinity Pointard e4b9e4e24d
rename types to CamelCase
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2021-05-03 22:15:09 +02:00