Commit graph

55 commits

Author SHA1 Message Date
12d1dbfc6b
remove Ring and use ClusterLayout everywhere 2023-11-08 15:41:24 +01:00
c7f5dcd953 fix compilation on macos
fsblkcnt_t is ony 32b there, so we have to do an additional cast
2023-10-15 17:57:27 +02:00
ad82035b98 Merge branch 'main' into next 2023-09-27 13:11:52 +02:00
0088599f52 new layout: fix clippy lints 2023-09-18 12:17:07 +02:00
91e764a2bf fix hang on shutdown 2023-09-12 14:35:48 +02:00
fd7d8fec59 Merge branch 'main' into next 2023-09-11 23:09:20 +02:00
9cfe55ab60 fix 32-bit build 2023-09-11 20:01:29 +02:00
51abbb02d8 Merge branch 'main' into next 2023-09-11 20:00:02 +02:00
d5bb50d738 use statvfs instead of mount list to determine free data/meta space (fix #611) 2023-09-11 19:08:24 +02:00
2f112ac682 correct free data space accounting for multiple data dirs on same fs 2023-09-07 14:42:20 +02:00
71c0188055 block manager: skeleton for multi-hdd support 2023-09-06 16:35:28 +02:00
35c108b85d admin api: switch GetClusterHealth to camelcase (fix #381 again) 2023-06-14 13:53:19 +02:00
fa78d806e3 Merge branch 'main' into next 2023-04-25 12:34:26 +02:00
8e93d69974 More clippy fixes 2023-01-26 17:26:32 +01:00
c7d0ad0aa0 Add local disk usage to exported prometheus metrics 2023-01-26 15:30:36 +01:00
efb6b6e868 Disk space report
Report available disk space on nodes and calculate cluster-wide available space in `garage stats` (fix #479)
2023-01-26 15:04:32 +01:00
Jonathan Davies
df1d9a9873 system.rs: Integrated SystemMetrics into System implementation. 2023-01-10 10:39:50 +00:00
570e5e5bbb
Merge branch 'main' into next 2023-01-04 11:34:43 +01:00
1fc220886a
Fix Consul & Kubernetes discovery with new way of doing background things 2023-01-03 16:55:59 +01:00
cdb2a591e9
Refactor how things are migrated 2023-01-03 14:44:47 +01:00
e6f14ab5cf
better error message handling 2022-12-14 16:11:19 +01:00
510b620108
Get rid of background::spawn 2022-12-14 16:08:05 +01:00
a19bfef508
Improve error message on rpc connection failure 2022-12-14 12:57:33 +01:00
d56c472712
Refactor background runner and get rid of job worker 2022-12-14 12:51:42 +01:00
2c2e65ad8b
Merge commit 'ec12d6c' into next 2022-12-11 18:41:15 +01:00
280d1be7b1
Refactor health check and add ability to return it in json 2022-12-05 15:28:57 +01:00
d75b37b018
Return more info when layout's .check() fails, fix compilation, fix test 2022-11-08 14:58:39 +01:00
2da8786f54
move things around 2022-10-18 19:13:52 +02:00
5d8d393054
Load TLS certificates only once 2022-10-18 19:11:16 +02:00
002b9fc50c
Add TLS support for Consul discovery + refactoring 2022-10-18 18:38:20 +02:00
ad917ffd3f
Fix instant substractions that might have panicked 2022-09-29 15:53:54 +02:00
56592e1853
RPC performance changes
- configurable ping timeout
- single, much higher, configurable RPC timeout
- no more concurrency semaphore
2022-09-19 20:31:00 +02:00
e46dc2a8ef
Allow for hostnames in bootstrap_peers and rpc_public_addr (fix #353) 2022-09-14 16:09:38 +02:00
ab722cb40f
Add checks on replication_factor of layouts we use (fix #363, fix #364) 2022-09-13 16:22:23 +02:00
7f54706b95
Merge branch 'lx-perf-improvements' into netapp-stream-body 2022-09-08 15:50:56 +02:00
db61f41030
Move GIT_VERSION injection later in build chain to reduce build times 2022-09-07 11:59:56 +02:00
df094bd807
Less strict timeouts 2022-09-01 16:30:44 +02:00
1921f4f7e6
Merge branch 'lx-perf-improvements' into netapp-stream-body 2022-08-29 16:45:05 +02:00
2c7bae935a
Configure structopt to report the right version
By default, structopt reports the value provided by
the env var CARGO_PKG_VERSION, feeded by Cargo when reading
Cargo.toml. However for Garage we use a versioning based on git,
so we often report a version that is behind the real version.
In this commit, we create garage_util::version::garage() that
reports the right version and configure all structopt subcommands
to call this function instead of using the env var.
2022-08-11 10:21:45 +02:00
8e7e680afe
First adaptation to WIP netapp with streaming body 2022-07-29 12:25:02 +02:00
4f38cadf6e Background task manager (#332)
- [x] New background worker trait
- [x] Adapt all current workers to use new API
- [x] Command to list currently running workers, and whether they are active, idle, or dead
- [x] Error reporting
- Optimizations
  - [x] Merkle updater: several items per iteration
  - [ ] Use `tokio::task::spawn_blocking` where appropriate so that CPU-intensive tasks don't block other things going on
- scrub:
  - [x] have only one worker with a channel to start/pause/cancel
  - [x] automatic scrub
  - [x] ability to view and change tranquility from CLI
  - [x] persistence of a few info
- [ ] Testing

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: Deuxfleurs/garage#332
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-07-08 13:30:26 +02:00
382e74c798 First version of admin API (#298)
**Spec:**

- [x] Start writing
- [x] Specify all layout endpoints
- [x] Specify all endpoints for operations on keys
- [x] Specify all endpoints for operations on key/bucket permissions
- [x] Specify all endpoints for operations on buckets
- [x] Specify all endpoints for operations on bucket aliases

View rendered spec at <https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/admin-api/doc/drafts/admin-api.md>

**Code:**

- [x] Refactor code for admin api to use common api code that was created for K2V

**General endpoints:**

- [x] Metrics
- [x] GetClusterStatus
- [x] ConnectClusterNodes
- [x] GetClusterLayout
- [x] UpdateClusterLayout
- [x] ApplyClusterLayout
- [x] RevertClusterLayout

**Key-related endpoints:**

- [x] ListKeys
- [x] CreateKey
- [x] ImportKey
- [x] GetKeyInfo
- [x] UpdateKey
- [x] DeleteKey

**Bucket-related endpoints:**

- [x] ListBuckets
- [x] CreateBucket
- [x] GetBucketInfo
- [x] DeleteBucket
- [x] PutBucketWebsite
- [x] DeleteBucketWebsite

**Operations on key/bucket permissions:**

- [x] BucketAllowKey
- [x] BucketDenyKey

**Operations on bucket aliases:**

- [x] GlobalAliasBucket
- [x] GlobalUnaliasBucket
- [x] LocalAliasBucket
- [x] LocalUnaliasBucket

**And also:**

- [x] Separate error type for the admin API (this PR includes a quite big refactoring of error handling)
- [x] Add management of website access
- [ ] Check that nothing is missing wrt what can be done using the CLI
- [ ] Improve formatting of the spec
- [x] Make sure everyone is cool with the API design

Fix #231
Fix #295

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: Deuxfleurs/garage#298
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-24 12:16:39 +02:00
9d0ed78887 Add feature flag for Kubernetes discovery 2022-03-24 16:57:43 +01:00
203e8d2c34
Bump version to 0.7 because of incompatible Netapp 2022-03-14 10:54:24 +01:00
bb04d94fa9
Update to Netapp 0.4 which supports distributed tracing 2022-03-14 10:52:30 +01:00
9d44127245
add support for kubernetes service discovery
This commit adds support to discover garage instances running in
kubernetes.

Once enabled by setting `kubernetes_namespace` and
`kubernetes_service_name` garage will create a Custom Resources
`garagenodes.deuxfleurs.fr` with nodes public key as the resource name.
and IP and Port information as spec in the namespace configured by
`kubernetes_namespace`.

For discovering nodes the resources are filtered with the optionally set
`kubernetes_service_name` which sets a label
`garage.deuxfleurs.fr/service` on the resources.

This allows to separate multiple garage deployments in a single
namespace.

the `kubernetes_skip_crd` variable allows to disable the creation of the
CRD by garage itself. The user must deploy this manually.
2022-03-12 13:05:52 +01:00
beeef4758e
Some movement of helper code and refactoring of error handling 2022-01-04 12:52:46 +01:00
c94406f428
Improve how node roles are assigned in Garage
- change the terminology: the network configuration becomes the role
  table, the configuration of a nodes becomes a node's role
- the modification of the role table takes place in two steps: first,
  changes are staged in a CRDT data structure. Then, once the user is
  happy with the changes, they can commit them all at once (or revert
  them).
- update documentation
- fix tests
- implement smarter partition assignation algorithm

This patch breaks the format of the network configuration: when
migrating, the cluster will be in a state where no roles are assigned.
All roles must be re-assigned and commited at once. This migration
should not pose an issue.
2021-11-16 16:05:53 +01:00
e8811f7c9d
Request strategy: don't launch all 3 requests if not needed 2021-11-04 16:19:27 +01:00
6f13d083ab
Add semaphore to limit RAM used by buffered outgoing requests 2021-11-03 18:02:57 +01:00