Commit graph

132 commits

Author SHA1 Message Date
Alex 55c514999e block manager: fixes in layout 2023-09-06 16:35:28 +02:00
Alex a44f486931 block manager: refactoring & increase max worker count to 8 2023-09-06 16:35:28 +02:00
Alex 3a74844df0 block manager: fix dir_not_empty 2023-09-06 16:35:28 +02:00
Alex 93114a9747 block manager: refactoring 2023-09-06 16:35:28 +02:00
Alex 1b8c265c14 block manager: get rid of check_block_status 2023-09-06 16:35:28 +02:00
Alex a09f86729c block manager: move blocks in write_block if necessary 2023-09-06 16:35:28 +02:00
Alex 887b3233f4 block manager: use data paths from layout 2023-09-06 16:35:28 +02:00
Alex 6c420c0880 block manager: multi-directory layout computation 2023-09-06 16:35:28 +02:00
Alex 71c0188055 block manager: skeleton for multi-hdd support 2023-09-06 16:35:28 +02:00
Alex 51eac97260 update version to 0.8.4
Some checks failed
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is failing
2023-09-05 23:28:12 +02:00
Alex 2e90e1c124 Merge branch 'main' into next
Some checks reported errors
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
continuous-integration/drone/tag Build was killed
continuous-integration/drone Build is passing
2023-08-29 11:32:42 +02:00
Alex cece1be1bb bump version to 0.8.3
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/tag Build is passing
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2023-08-28 13:17:26 +02:00
Jonathan Davies aee0d97f22 cargo: Updated async-compression to 0.4.
Some checks failed
continuous-integration/drone/pr Build is failing
2023-06-28 11:17:16 +01:00
Alex 90b2d43eb4 Merge branch 'main' into next
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
2023-06-13 17:14:11 +02:00
Alex e7e164a280 Make fsync an option for meta and data
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is passing
2023-06-09 16:23:21 +02:00
Alex ea9b15f669 Merge pull request 'cargo: tokio-1.28 and hyper-0.14.26 update' (#569) from jpds/garage:tokio-1.28 into main
All checks were successful
continuous-integration/drone/push Build is passing
Reviewed-on: #569
2023-05-11 10:16:33 +00:00
Jonathan Davies c783194e8b *: apply clippy recommendations.
All checks were successful
continuous-integration/drone/pr Build is passing
2023-05-09 20:49:34 +01:00
Jonathan Davies 0f0795103d block/Cargo.toml: Bump tokio-util to 0.7. 2023-05-09 14:33:21 +01:00
Alex 2f495575d8 Merge pull request 'block/manager.rs: Prioritize raw blocks when no compression configured' (#566) from jpds/garage:skip-compressed-blocks-scrub-no-compression into main
All checks were successful
continuous-integration/drone/push Build is passing
Reviewed-on: #566
2023-05-09 09:39:48 +00:00
Jonathan Davies 9c788059e2 block/manager.rs: In is_block_compressed - check which compression_level
All checks were successful
continuous-integration/drone/pr Build is passing
is configured on a node and check for raw block first if compression is
disabled (to help reduce syscalls during a scrub).
2023-05-09 10:28:19 +01:00
Jakub Jirutka d2deee0b8b Declare garage crates using workspace.dependencies
This will allow to really disable "sled" feature without declaring
`default-features = false` in every Cargo.toml where garage_db and
garage_model is used.

See https://doc.rust-lang.org/cargo/reference/workspaces.html#the-dependencies-table
2023-05-09 08:46:15 +00:00
Jonathan Davies fb3bd11dce block/repair.rs: Added log entries for scrub start/finish.
All checks were successful
continuous-integration/drone/pr Build is passing
2023-04-23 22:22:26 +01:00
Alex 0a1ddcf630 Prepare for v0.8.2
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
2023-03-13 18:46:31 +01:00
Jonathan Davies d218f475cb block/manager.rs: Set defaults for scrub_persister.
All checks were successful
continuous-integration/drone/pr Build is passing
2023-03-09 17:08:47 +00:00
Jonathan Davies 7b65dd24e2 block/repair.rs: Added a timestamp argument to
All checks were successful
continuous-integration/drone/pr Build is passing
randomize_next_scrub_run_time().
2023-03-09 16:38:41 +00:00
Jonathan Davies b70cc0a940 block/repair.rs: Added migration for ScrubWorkerPersisted's time_next_run_scrub.
Fixes: #520.
2023-03-09 16:38:36 +00:00
Jonathan Davies 148b66b843 block/manager.rs: Display scrub-next-run.
All checks were successful
continuous-integration/drone/pr Build is passing
2023-03-06 13:43:09 +00:00
Jonathan Davies 53d09eb00f block/repair.rs: Added function and time_next_run_scrub with a random element of
10 days to SCRUB_INTERVAL to help balance scrub load across cluster.
2023-03-06 13:43:04 +00:00
Alex 8e93d69974 More clippy fixes
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
2023-01-26 17:26:32 +01:00
Jonathan Davies 20c1cdf662 Cargo.toml: Loosen tracing dependency to just 0.1. 2023-01-26 11:13:11 +00:00
Jonathan Davies 5c3075fe01 Cargo.toml: Updated zstd from 0.9 to 0.12. 2023-01-23 18:08:14 +00:00
Jonathan Davies 4cfb469d2b block/metrics.rs: Added compression_level metric.
All checks were successful
continuous-integration/drone/pr Build is passing
2023-01-10 10:40:03 +00:00
Alex 02e8eb167e Merge pull request 'PutObject: better cleanup when request is interrupted in the middle' (#462) from interrupted-cleanup into main
All checks were successful
continuous-integration/drone/push Build is passing
Reviewed-on: #462
2023-01-04 14:43:45 +00:00
Alex f3f27293df
Uniform framework for bg variable management
All checks were successful
continuous-integration/drone/push Build is passing
2023-01-04 13:07:13 +01:00
Alex 936b6cb563
When saving block, delete .tmp file if we could not complete
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
continuous-integration/drone Build is passing
2023-01-03 17:34:26 +01:00
Alex 8d5505514f
Make it explicit when using nonversioned encoding
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
2023-01-03 15:27:36 +01:00
Alex cdb2a591e9
Refactor how things are migrated
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
2023-01-03 14:44:47 +01:00
Alex 939a6d67e8
Merge branch 'main' into internals-rework
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
2023-01-02 15:07:44 +01:00
Alex 6775569525
Bump everything to v0.8.1
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
2023-01-02 14:15:33 +01:00
Alex dfc131850a
Simplified and more aggressive worker exit logic
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2022-12-14 15:25:29 +01:00
Alex d56c472712
Refactor background runner and get rid of job worker
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
2022-12-14 12:51:42 +01:00
Alex 2183518edc
Spawn all background workers in a separate step 2022-12-14 12:28:07 +01:00
Alex 041b60ed1d
Add block.rc_size, table.size and table.merkle_tree_size metrics
Some checks reported errors
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
continuous-integration/drone Build was killed
2022-12-13 15:54:03 +01:00
Alex d6040e32a6
cli: prettier table in garage stats
Some checks reported errors
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build was killed
2022-12-13 15:43:22 +01:00
Alex d7f90cabb0
Implement block retry-now and block purge
Some checks reported errors
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
continuous-integration/drone Build was killed
2022-12-13 15:02:42 +01:00
Alex 687660b27f
Implement block list-errors and block info
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2022-12-13 14:23:45 +01:00
Alex 9d82196945
cli: new worker info command
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
2022-12-13 12:24:30 +01:00
Alex de9d6cddf7
Prettier worker list table; remove useless CLI log messages
All checks were successful
continuous-integration/drone/push Build is passing
2022-12-12 17:17:05 +01:00
Alex 56592e1853
RPC performance changes
Some checks reported errors
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
continuous-integration/drone Build was killed
- configurable ping timeout
- single, much higher, configurable RPC timeout
- no more concurrency semaphore
2022-09-19 20:31:00 +02:00
Alex b823151a0b
improvements in block manager
All checks were successful
continuous-integration/drone/push Build is passing
2022-09-12 16:57:38 +02:00
Alex 7f54706b95
Merge branch 'lx-perf-improvements' into netapp-stream-body
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2022-09-08 15:50:56 +02:00
Alex d9d199a6c9
Merge branch 'main' into lx-perf-improvements
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2022-09-08 15:49:17 +02:00
Alex 8adc654713
Merge branch 'main' into improve-deps 2022-09-07 18:13:27 +02:00
Alex 6b958979bd
Merge branch 'lx-perf-improvements' into netapp-stream-body
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2022-09-06 22:13:01 +02:00
Alex c2cc08852b
Reenable node ordering
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is passing
2022-09-06 19:31:42 +02:00
Alex 48ffaaadfc
Bump versions to 0.8.0 (compatibility is broken already)
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
2022-09-06 16:47:56 +02:00
Alex 07e6bcde85
Merge branch 'main' into lx-perf-improvements
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2022-09-05 12:40:17 +02:00
Jakub Jirutka a6e40b75ea Add feature "system-libs" to enable linking against system libraries
If this feature is enabled, libsodium-sys and zstd-sys will link
dynamically against system-provided libraries instead of building
and linking statically the bundled (possibly outdated and vulnerable)
copies of them. This feature is intended mainly for linux package
maintainers.
2022-09-03 18:44:34 +02:00
Alex e1751c8a9c
fix clippy
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
2022-09-02 17:24:26 +02:00
Alex 5d4b937a00
Ability to have up to 4 concurrently working resync workers
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
2022-09-02 17:18:13 +02:00
Alex 5e8baa433d
Make BlockManagerLocked fully private again
All checks were successful
continuous-integration/drone/push Build is passing
2022-09-02 16:52:22 +02:00
Alex 47be652a1f
block manager: refactor: split resync into separate file
All checks were successful
continuous-integration/drone/push Build is passing
2022-09-02 16:47:15 +02:00
Alex 943d76c583
Ability to dynamically set resync tranquility
All checks were successful
continuous-integration/drone/push Build is passing
2022-09-02 15:34:21 +02:00
Alex 99b532b85b
Apply PRIO_SECONDARY to block data transfers
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
2022-09-01 16:35:43 +02:00
Alex df094bd807
Less strict timeouts 2022-09-01 16:30:44 +02:00
Alex bc977f9a7a
Update to Netapp with OrderTag support and exploit OrderTags
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
2022-09-01 12:58:20 +02:00
Alex 70231d68b2
Fix bytes_read counter
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
2022-08-31 19:44:27 +02:00
Alex e935861854
Factor out node request order selection logic & use in manager
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
continuous-integration/drone Build is failing
2022-07-29 12:25:03 +02:00
Alex 605a630333
Use streaming in block manager 2022-07-29 12:25:02 +02:00
Alex 8e7e680afe
First adaptation to WIP netapp with streaming body 2022-07-29 12:25:02 +02:00
Alex 2f111e6b3d
Performance improvements:
- reduce contention on mutation_lock by having 256 of them
- better lmdb defaults
2022-07-29 12:24:48 +02:00
Alex 1b2e1296eb
Compute hashes on dedicated threads 2022-07-29 12:24:44 +02:00
Alex 4f38cadf6e Background task manager (#332)
All checks were successful
continuous-integration/drone/push Build is passing
- [x] New background worker trait
- [x] Adapt all current workers to use new API
- [x] Command to list currently running workers, and whether they are active, idle, or dead
- [x] Error reporting
- Optimizations
  - [x] Merkle updater: several items per iteration
  - [ ] Use `tokio::task::spawn_blocking` where appropriate so that CPU-intensive tasks don't block other things going on
- scrub:
  - [x] have only one worker with a channel to start/pause/cancel
  - [x] automatic scrub
  - [x] ability to view and change tranquility from CLI
  - [x] persistence of a few info
- [ ] Testing

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #332
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-07-08 13:30:26 +02:00
Alex b44d3fc796 Abstract database behind generic interface and implement alternative drivers (#322)
All checks were successful
continuous-integration/drone/push Build is passing
- [x] Design interface
- [x] Implement Sled backend
  - [x] Re-implement the SledCountedTree hack ~~on Sled backend~~ on all backends (i.e. over the abstraction)
- [x] Convert Garage code to use generic interface
- [x] Proof-read converted Garage code
- [ ] Test everything well
- [x] Implement sqlite backend
- [x] Implement LMDB backend
- [ ] (Implement Persy backend?)
- [ ] (Implement other backends? (like RocksDB, ...))
- [x] Implement backend choice in config file and garage server module
- [x] Add CLI for converting between DB formats
- Exploit the new interface to put more things in transactions
  - [x] `.updated()` trigger on Garage tables

Fix #284

**Bugs**

- [x] When exporting sqlite, trees iterate empty??
- [x] LMDB doesn't work

**Known issues for various back-ends**

- Sled:
  - Eats all my RAM and also all my disk space
  - `.len()` has to traverse the whole table
  - Is actually quite slow on some operations
  - And is actually pretty bad code...
- Sqlite:
  - Requires a lock to be taken on all operations. The lock is also taken when iterating on a table with `.iter()`, and the lock isn't released until the iterator is dropped. This means that we must be VERY carefull to not do anything else inside a `.iter()` loop or else we will have a deadlock! Most such cases have been eliminated from the Garage codebase, but there might still be some that remain. If your Garage-over-Sqlite seems to hang/freeze, this is the reason.
  - (adapter uses a bunch of unsafe code)
- Heed (LMDB):
  - Not suited for 32-bit machines as it has to map the whole DB in memory.
  - (adpater uses a tiny bit of unsafe code)

**My recommendation:** avoid 32-bit machines and use LMDB as much as possible.

**Converting databases** is actually quite easy. For example from Sled to LMDB:

```bash
cd src/db
cargo run --features cli --bin convert -- -i path/to/garage/meta/db -a sled -o path/to/garage/meta/db.lmdb -b lmdb
```

Then, just add this to your `config.toml`:

```toml
db_engine = "lmdb"
```

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #322
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-06-08 10:01:44 +02:00
Alex 5768bf3622 First implementation of K2V (#293)
All checks were successful
continuous-integration/drone/push Build is passing
**Specification:**

View spec at [this URL](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md)

- [x] Specify the structure of K2V triples
- [x] Specify the DVVS format used for causality detection
- [x] Specify the K2V index (just a counter of number of values per partition key)
- [x] Specify single-item endpoints: ReadItem, InsertItem, DeleteItem
- [x] Specify index endpoint: ReadIndex
- [x] Specify multi-item endpoints: InsertBatch, ReadBatch, DeleteBatch
- [x] Move to JSON objects instead of tuples
- [x] Specify endpoints for polling for updates on single values (PollItem)

**Implementation:**

- [x] Table for K2V items, causal contexts
- [x] Indexing mechanism and table for K2V index
- [x] Make API handlers a bit more generic
- [x] K2V API endpoint
- [x] K2V API router
- [x] ReadItem
- [x] InsertItem
- [x] DeleteItem
- [x] PollItem
- [x] ReadIndex
- [x] InsertBatch
- [x] ReadBatch
- [x] DeleteBatch

**Testing:**

- [x] Just a simple Python script that does some requests to check visually that things are going right (does not contain parsing of results or assertions on returned values)
- [x] Actual tests:
  - [x] Adapt testing framework
  - [x] Simple test with InsertItem + ReadItem
  - [x] Test with several Insert/Read/DeleteItem + ReadIndex
  - [x] Test all combinations of return formats for ReadItem
  - [x] Test with ReadBatch, InsertBatch, DeleteBatch
  - [x] Test with PollItem
  - [x] Test error codes
- [ ] Fix most broken stuff
  - [x] test PollItem broken randomly
  - [x] when invalid causality tokens are given, errors should be 4xx not 5xx

**Improvements:**

- [x] Descending range queries
  - [x] Specify
  - [x] Implement
  - [x] Add test
- [x] Batch updates to index counter
- [x] Put K2V behind `k2v` feature flag

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #293
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-10 13:16:57 +02:00
Alex cb5836d53c Bring maximum exponential backoff time down from 16h to 1h
All checks were successful
continuous-integration/drone/push Build is passing
2022-04-07 11:49:29 +02:00
Alex 913f7754bb
Add blocks in errored state to garage stats
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
2022-03-28 15:47:23 +02:00
Alex 3dc9214172
Add lots of comments on how the resync queue works
(I don't really want to change/refactor that code though)
2022-03-23 10:25:39 +01:00
Alex e480aaf338
Make background tranquility a configurable parameter 2022-03-23 10:25:19 +01:00
Alex 8fd6745745
Move block RC code to separate rc.rs 2022-03-23 10:25:19 +01:00
Alex c3982a90b6
Move DataBlock out of manager.rs 2022-03-23 10:25:19 +01:00
Alex c1d9854d2c
Move block manager to separate module 2022-03-23 10:25:15 +01:00