Compare commits

..

16 Commits

Author SHA1 Message Date
Alex 349c94c4b6
Enable k2v feature flag in CI
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/tag Build was killed Details
continuous-integration/drone Build is passing Details
2022-06-21 11:17:31 +02:00
Alex 77e3fd6db2 improve internal item counter mechanisms and implement bucket quotas (#326)
continuous-integration/drone/push Build is passing Details
- [x] Refactoring of internal counting API
- [x] Repair procedure for counters (it's an offline procedure!!!)
- [x] New counter for objects in buckets
- [x] Add quotas to buckets struct
- [x] Add CLI to manage bucket quotas
- [x] Add admin API to manage bucket quotas
- [x] Apply quotas by adding checks on put operations
- [x] Proof-read

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #326
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-06-15 20:20:28 +02:00
Quentin d544a0e0e0
Send CORS headers for all requests
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2022-06-13 10:19:52 +02:00
Alex 138e13071b
Fix garage_db build on 32-bit systems
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone Build is passing Details
continuous-integration/drone/push Build is passing Details
2022-06-09 14:55:20 +02:00
Alex b44d3fc796 Abstract database behind generic interface and implement alternative drivers (#322)
continuous-integration/drone/push Build is passing Details
- [x] Design interface
- [x] Implement Sled backend
  - [x] Re-implement the SledCountedTree hack ~~on Sled backend~~ on all backends (i.e. over the abstraction)
- [x] Convert Garage code to use generic interface
- [x] Proof-read converted Garage code
- [ ] Test everything well
- [x] Implement sqlite backend
- [x] Implement LMDB backend
- [ ] (Implement Persy backend?)
- [ ] (Implement other backends? (like RocksDB, ...))
- [x] Implement backend choice in config file and garage server module
- [x] Add CLI for converting between DB formats
- Exploit the new interface to put more things in transactions
  - [x] `.updated()` trigger on Garage tables

Fix #284

**Bugs**

- [x] When exporting sqlite, trees iterate empty??
- [x] LMDB doesn't work

**Known issues for various back-ends**

- Sled:
  - Eats all my RAM and also all my disk space
  - `.len()` has to traverse the whole table
  - Is actually quite slow on some operations
  - And is actually pretty bad code...
- Sqlite:
  - Requires a lock to be taken on all operations. The lock is also taken when iterating on a table with `.iter()`, and the lock isn't released until the iterator is dropped. This means that we must be VERY carefull to not do anything else inside a `.iter()` loop or else we will have a deadlock! Most such cases have been eliminated from the Garage codebase, but there might still be some that remain. If your Garage-over-Sqlite seems to hang/freeze, this is the reason.
  - (adapter uses a bunch of unsafe code)
- Heed (LMDB):
  - Not suited for 32-bit machines as it has to map the whole DB in memory.
  - (adpater uses a tiny bit of unsafe code)

**My recommendation:** avoid 32-bit machines and use LMDB as much as possible.

**Converting databases** is actually quite easy. For example from Sled to LMDB:

```bash
cd src/db
cargo run --features cli --bin convert -- -i path/to/garage/meta/db -a sled -o path/to/garage/meta/db.lmdb -b lmdb
```

Then, just add this to your `config.toml`:

```toml
db_engine = "lmdb"
```

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #322
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-06-08 10:01:44 +02:00
Simon C 7eed3ceda9 docs: Add Trafik reverse proxy documentation
continuous-integration/drone/push Build is passing Details
2022-06-07 16:16:52 +02:00
Simon C 4b8f48f3c5 docs: Fix title level
continuous-integration/drone/push Build is passing Details
2022-06-07 13:32:52 +02:00
Simon C 7d3b5585f1 docs: Add link to facilitate navigation in the documentation
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2022-06-07 09:38:59 +02:00
Quentin a1abed0378
Remove useless MC_REGION env variable
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build was killed Details
2022-06-02 12:50:11 +02:00
Alex b54a938724 Fix garage_version() now that GIT_VERSION is read in crate garage_rpc
continuous-integration/drone/tag Build is passing Details
continuous-integration/drone Build is passing Details
continuous-integration/drone/push Build was killed Details
2022-06-02 12:00:10 +02:00
Alex ff06d3f082
Fix Content-Type headers for {admin,k2v} errors and admin responses
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
Fix #315
2022-05-25 17:09:33 +02:00
Alex 93eab8eaa3 Fixes to S3 compatibility page (#314)
continuous-integration/drone/push Build is passing Details
Mention PostObject is implemented, fix english mistakes

Co-authored-by: Alex Auvolat <alex@adnab.me>
Reviewed-on: #314
Co-authored-by: Alex <alex@adnab.me>
Co-committed-by: Alex <alex@adnab.me>
2022-05-25 16:54:44 +02:00
Quentin 43ddc933f9
Update Ceph S3 endpoints compatibility
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2022-05-25 15:20:08 +02:00
Alex 9f303f6308
Shorter page title
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
2022-05-24 15:47:42 +02:00
Alex 3be43f3372
Add lost content for Restic with Garage
continuous-integration/drone/pr Build is passing Details
continuous-integration/drone/push Build is passing Details
Suggested-by: Quentin <quentin@deuxfleurs.fr>
2022-05-24 15:32:42 +02:00
Alex 2da448b43f
Add documentation for new Admin API and a few infos on K2V
continuous-integration/drone/push Build is passing Details
continuous-integration/drone/pr Build is passing Details
2022-05-24 15:28:37 +02:00
73 changed files with 4412 additions and 1052 deletions

264
Cargo.lock generated
View File

@ -2,6 +2,17 @@
# It is not intended for manual editing.
version = 3
[[package]]
name = "ahash"
version = "0.7.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fcb51a0695d8f838b1ee009b3fbf66bda078cd64590202a864a8f3e8c4315c47"
dependencies = [
"getrandom",
"once_cell",
"version_check",
]
[[package]]
name = "aho-corasick"
version = "0.7.18"
@ -301,6 +312,15 @@ version = "0.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "904dfeac50f3cdaba28fc6f57fdcddb75f49ed61346676a78c4ffe55877802fd"
[[package]]
name = "bincode"
version = "1.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b1f45e9417d87227c7a56d22e471c6206462cba514c7590c09aff4cf6d1ddcad"
dependencies = [
"serde",
]
[[package]]
name = "bitflags"
version = "1.3.2"
@ -333,6 +353,12 @@ version = "3.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a4a45a46ab1f2412e53d3a0ade76ffad2025804294569aae387231a0cd6e0899"
[[package]]
name = "bytemuck"
version = "1.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cdead85bdec19c194affaeeb670c0e41fe23de31459efd1c174d049269cf02cc"
[[package]]
name = "byteorder"
version = "1.4.3"
@ -361,6 +387,12 @@ dependencies = [
"either",
]
[[package]]
name = "bytesize"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6c58ec36aac5066d5ca17df51b3e70279f5670a72102f5752cb7e7c856adfc70"
[[package]]
name = "cc"
version = "1.0.73"
@ -370,6 +402,12 @@ dependencies = [
"jobserver",
]
[[package]]
name = "cfg-if"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4785bdd1c96b2a846b2bd7cc02e86b6b3dbf14e7e53446c4f54c92a361040822"
[[package]]
name = "cfg-if"
version = "1.0.0"
@ -486,7 +524,7 @@ version = "1.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b540bd8bc810d3885c6ea91e2018302f68baba2129ab3e88f32389ee9370880d"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
]
[[package]]
@ -495,8 +533,8 @@ version = "0.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5aaa7bd5fb665c6864b5f963dd9097905c54125909c7aa94c9e18507cdbe6c53"
dependencies = [
"cfg-if",
"crossbeam-utils",
"cfg-if 1.0.0",
"crossbeam-utils 0.8.8",
]
[[package]]
@ -506,20 +544,39 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1145cf131a2c6ba0615079ab6a638f7e1973ac9c2634fcbeaaad6114246efe8c"
dependencies = [
"autocfg",
"cfg-if",
"crossbeam-utils",
"cfg-if 1.0.0",
"crossbeam-utils 0.8.8",
"lazy_static",
"memoffset",
"scopeguard",
]
[[package]]
name = "crossbeam-queue"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7c979cd6cfe72335896575c6b5688da489e420d36a27a0b9eb0c73db574b4a4b"
dependencies = [
"crossbeam-utils 0.6.6",
]
[[package]]
name = "crossbeam-utils"
version = "0.6.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "04973fa96e96579258a5091af6003abde64af786b860f18622b82e026cca60e6"
dependencies = [
"cfg-if 0.1.10",
"lazy_static",
]
[[package]]
name = "crossbeam-utils"
version = "0.8.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0bf124c720b7686e3c2663cf54062ab0f68a88af2fb6a030e87e30bf721fcb38"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"lazy_static",
]
@ -603,7 +660,7 @@ version = "4.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e77a43b28d0668df09411cb0bc9a8c2adc40f9a048afe863e05fd43251e8e39c"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"num_cpus",
]
@ -633,7 +690,7 @@ version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b98cf8ebf19c3d1b223e151f99a4f9f0690dca41414773390fc824184ac833e1"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"dirs-sys-next",
]
@ -672,7 +729,7 @@ version = "0.8.30"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7896dc8abb250ffdda33912550faa54c88ec8b998dec0b2c55ab224921ce11df"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
]
[[package]]
@ -716,6 +773,18 @@ dependencies = [
"synstructure",
]
[[package]]
name = "fallible-iterator"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4443176a9f2c162692bd3d352d745ef9413eec5782a80d8fd6f8a1ac692a07f7"
[[package]]
name = "fallible-streaming-iterator"
version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7360491ce676a36bf9bb3c56c1aa791658183a54d2744120f27285738d90465a"
[[package]]
name = "fastrand"
version = "1.7.0"
@ -885,10 +954,12 @@ dependencies = [
"aws-sdk-s3",
"base64",
"bytes 1.1.0",
"bytesize",
"chrono",
"futures",
"futures-util",
"garage_api",
"garage_db",
"garage_model 0.7.0",
"garage_rpc 0.7.0",
"garage_table 0.7.0",
@ -911,7 +982,6 @@ dependencies = [
"serde_bytes",
"serde_json",
"sha2",
"sled",
"static_init",
"structopt",
"tokio",
@ -972,6 +1042,7 @@ dependencies = [
"bytes 1.1.0",
"futures",
"futures-util",
"garage_db",
"garage_rpc 0.7.0",
"garage_table 0.7.0",
"garage_util 0.7.0",
@ -981,12 +1052,26 @@ dependencies = [
"rmp-serde 0.15.5",
"serde",
"serde_bytes",
"sled",
"tokio",
"tracing",
"zstd",
]
[[package]]
name = "garage_db"
version = "0.8.0"
dependencies = [
"clap 3.1.18",
"err-derive 0.3.1",
"heed",
"hexdump",
"log",
"mktemp",
"pretty_env_logger",
"rusqlite",
"sled",
]
[[package]]
name = "garage_model"
version = "0.5.1"
@ -1024,6 +1109,7 @@ dependencies = [
"futures",
"futures-util",
"garage_block",
"garage_db",
"garage_model 0.5.1",
"garage_rpc 0.7.0",
"garage_table 0.7.0",
@ -1035,7 +1121,6 @@ dependencies = [
"rmp-serde 0.15.5",
"serde",
"serde_bytes",
"sled",
"tokio",
"tracing",
"zstd",
@ -1130,6 +1215,7 @@ dependencies = [
"bytes 1.1.0",
"futures",
"futures-util",
"garage_db",
"garage_rpc 0.7.0",
"garage_util 0.7.0",
"hexdump",
@ -1138,7 +1224,6 @@ dependencies = [
"rmp-serde 0.15.5",
"serde",
"serde_bytes",
"sled",
"tokio",
"tracing",
]
@ -1177,6 +1262,7 @@ dependencies = [
"chrono",
"err-derive 0.3.1",
"futures",
"garage_db",
"hex",
"http",
"hyper",
@ -1187,7 +1273,6 @@ dependencies = [
"serde",
"serde_json",
"sha2",
"sled",
"tokio",
"toml",
"tracing",
@ -1237,7 +1322,7 @@ version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d39cd93900197114fa1fcb7ae84ca742095eed9442088988ae74fa744e930e77"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"libc",
"wasi 0.10.0+wasi-snapshot-preview1",
]
@ -1288,6 +1373,18 @@ name = "hashbrown"
version = "0.11.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ab5ef0d4909ef3724cc8cce6ccc8572c5c817592e9285f5464f8e86f8bd3726e"
dependencies = [
"ahash",
]
[[package]]
name = "hashlink"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7249a3129cbc1ffccd74857f81464a323a152173cdb134e0fd81bc803b29facf"
dependencies = [
"hashbrown",
]
[[package]]
name = "heck"
@ -1304,6 +1401,45 @@ version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2540771e65fc8cb83cd6e8a237f70c319bd5c29f78ed1084ba5d50eeac86f7f9"
[[package]]
name = "heed"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "269c7486ed6def5d7b59a427cec3e87b4d4dd4381d01e21c8c9f2d3985688392"
dependencies = [
"bytemuck",
"byteorder",
"heed-traits",
"heed-types",
"libc",
"lmdb-rkv-sys",
"once_cell",
"page_size",
"serde",
"synchronoise",
"url",
]
[[package]]
name = "heed-traits"
version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a53a94e5b2fd60417e83ffdfe136c39afacff0d4ac1d8d01cd66928ac610e1a2"
[[package]]
name = "heed-types"
version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9a6cf0a6952fcedc992602d5cddd1e3fff091fbe87d38636e3ec23a31f32acbd"
dependencies = [
"bincode",
"bytemuck",
"byteorder",
"heed-traits",
"serde",
"serde_json",
]
[[package]]
name = "hermit-abi"
version = "0.1.19"
@ -1503,7 +1639,7 @@ version = "0.1.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a5bbe824c507c5da5956355e86a746d82e0e1464f65d862cc5e71da70e94b2c"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
]
[[package]]
@ -1758,12 +1894,34 @@ dependencies = [
"walkdir",
]
[[package]]
name = "libsqlite3-sys"
version = "0.24.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "898745e570c7d0453cc1fbc4a701eb6c662ed54e8fec8b7d14be137ebeeb9d14"
dependencies = [
"cc",
"pkg-config",
"vcpkg",
]
[[package]]
name = "linked-hash-map"
version = "0.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7fb9b38af92608140b86b693604b9ffcc5824240a484d1ecd4795bacb2fe88f3"
[[package]]
name = "lmdb-rkv-sys"
version = "0.11.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "61b9ce6b3be08acefa3003c57b7565377432a89ec24476bbe72e11d101f852fe"
dependencies = [
"cc",
"libc",
"pkg-config",
]
[[package]]
name = "lock_api"
version = "0.4.6"
@ -1779,7 +1937,7 @@ version = "0.4.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6389c490849ff5bc16be905ae24bc913a9c8892e19b2341dbc175e14c341c2b8"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
]
[[package]]
@ -1855,6 +2013,15 @@ dependencies = [
"winapi",
]
[[package]]
name = "mktemp"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "975de676448231fcde04b9149d2543077e166b78fc29eae5aa219e7928410da2"
dependencies = [
"uuid",
]
[[package]]
name = "multer"
version = "2.0.2"
@ -1928,7 +2095,7 @@ dependencies = [
"arc-swap",
"async-trait",
"bytes 0.6.0",
"cfg-if",
"cfg-if 1.0.0",
"err-derive 0.2.4",
"futures",
"hex",
@ -2021,7 +2188,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0c7ae222234c30df141154f159066c5093ff73b63204dcda7121eb082fc56a95"
dependencies = [
"bitflags",
"cfg-if",
"cfg-if 1.0.0",
"foreign-types",
"libc",
"once_cell",
@ -2134,6 +2301,16 @@ version = "6.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "029d8d0b2f198229de29dca79676f2738ff952edf3fde542eb8bf94d8c21b435"
[[package]]
name = "page_size"
version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eebde548fbbf1ea81a99b128872779c437752fb99f217c45245e1a61dcd9edcd"
dependencies = [
"libc",
"winapi",
]
[[package]]
name = "parking_lot"
version = "0.11.2"
@ -2161,7 +2338,7 @@ version = "0.8.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d76e8e1493bcac0d2766c42737f34458f1c8c50c0d23bcb24ea953affb273216"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"instant",
"libc",
"redox_syscall",
@ -2175,7 +2352,7 @@ version = "0.9.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "28141e0cc4143da2443301914478dc976a61ffdb3f043058310c70df2fed8954"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"libc",
"redox_syscall",
"smallvec",
@ -2357,7 +2534,7 @@ version = "0.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b7f64969ffd5dd8f39bd57a68ac53c163a095ed9d0fb707146da1b27025a3504"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"fnv",
"lazy_static",
"memchr",
@ -2679,6 +2856,21 @@ dependencies = [
"tokio",
]
[[package]]
name = "rusqlite"
version = "0.27.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85127183a999f7db96d1a976a309eebbfb6ea3b0b400ddd8340190129de6eb7a"
dependencies = [
"bitflags",
"fallible-iterator",
"fallible-streaming-iterator",
"hashlink",
"libsqlite3-sys",
"memchr",
"smallvec",
]
[[package]]
name = "rustc_version"
version = "0.4.0"
@ -2894,7 +3086,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4d58a1e1bf39749807d89cf2d98ac2dfa0ff1cb3faa38fbb64dd88ac8013d800"
dependencies = [
"block-buffer",
"cfg-if",
"cfg-if 1.0.0",
"cpufeatures",
"digest",
"opaque-debug",
@ -2929,7 +3121,7 @@ checksum = "7f96b4737c2ce5987354855aed3797279def4ebf734436c6aa4552cf8e169935"
dependencies = [
"crc32fast",
"crossbeam-epoch",
"crossbeam-utils",
"crossbeam-utils 0.8.8",
"fs2",
"fxhash",
"libc",
@ -3063,6 +3255,15 @@ dependencies = [
"unicode-xid",
]
[[package]]
name = "synchronoise"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d717ed0efc9d39ab3b642a096bc369a3e02a38a51c41845d7fe31bdad1d6eaeb"
dependencies = [
"crossbeam-queue",
]
[[package]]
name = "synstructure"
version = "0.12.6"
@ -3081,7 +3282,7 @@ version = "3.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5cdb1ef4eaeeaddc8fbd371e5017057064af0911902ef36b39801f67cc6d79e4"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"fastrand",
"libc",
"redox_syscall",
@ -3380,7 +3581,7 @@ version = "0.1.32"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4a1bdf54a7c28a2bbf701e1d2233f6c77f473486b94bee4f9678da5a148dca7f"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"log",
"pin-project-lite",
"tracing-attributes",
@ -3489,6 +3690,15 @@ dependencies = [
"percent-encoding",
]
[[package]]
name = "uuid"
version = "0.8.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bc5cf98d8186244414c848017f0e2676b3fcb46807f6668a97dfe67359a3c4b7"
dependencies = [
"getrandom",
]
[[package]]
name = "vcpkg"
version = "0.2.15"
@ -3540,7 +3750,7 @@ version = "0.2.79"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "25f1af7423d8588a3d840681122e72e6a24ddbcb3f0ec385cac0d12d24256c06"
dependencies = [
"cfg-if",
"cfg-if 1.0.0",
"wasm-bindgen-macro",
]

706
Cargo.nix

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,6 @@
[workspace]
members = [
"src/db",
"src/util",
"src/rpc",
"src/table",

View File

@ -69,7 +69,7 @@ in let
we ask the user (the CI often) to pass the value to Nix.
*/
(pkgs.rustBuilder.rustLib.makeOverride {
name = "garage";
name = "garage_rpc";
overrideAttrs = drv:
/* [1] */ { hardeningDisable = [ "pie" ]; }
//

View File

@ -17,6 +17,61 @@ If you still want to use Borg, you can use it with `rclone mount`.
## Restic
Create your key and bucket:
```bash
garage key new my-key
garage bucket create backup
garage bucket allow backup --read --write --key my-key
```
Then register your Key ID and Secret key in your environment:
```bash
export AWS_ACCESS_KEY_ID=GKxxx
export AWS_SECRET_ACCESS_KEY=xxxx
```
Configure restic from environment too:
```bash
export RESTIC_REPOSITORY="s3:http://localhost:3900/backups"
echo "Generated password (save it safely): $(openssl rand -base64 32)"
export RESTIC_PASSWORD=xxx # copy paste your generated password here
```
Do not forget to save your password safely (in your password manager or print it). It will be needed to decrypt your backups.
Now you can use restic:
```bash
# Initialize the bucket, must be run once
restic init
# Backup your PostgreSQL database
# (We suppose your PostgreSQL daemon is stopped for all commands)
restic backup /var/lib/postgresql
# Show backup history
restic snapshots
# Backup again your PostgreSQL database, it will be faster as only changes will be uploaded
restic backup /var/lib/postgresql
# Show backup history (again)
restic snapshots
# Restore a backup
# (79766175 is the ID of the snapshot you want to restore)
mv /var/lib/postgresql /var/lib/postgresql.broken
restic restore 79766175 --target /var/lib/postgresql
```
Restic has way more features than the ones presented here.
You can discover all of them by accessing its documentation from the link below.
*External links:* [Restic Documentation > Amazon S3](https://restic.readthedocs.io/en/stable/030_preparing_a_new_repo.html#amazon-s3)
## Duplicity

View File

@ -3,7 +3,7 @@ title = "Websites (Hugo, Jekyll, Publii...)"
weight = 10
+++
Garage is also suitable to host static websites.
Garage is also suitable [to host static websites](@/documentation/cookbook/exposing-websites.md).
While they can be deployed with traditional CLI tools, some static website generators have integrated options to ease your workflow.
| Name | Status | Note |

View File

@ -100,7 +100,7 @@ server {
}
```
## Exposing the web endpoint
### Exposing the web endpoint
To better understand the logic involved, you can refer to the [Exposing buckets as websites](/cookbook/exposing_websites.html) section.
Otherwise, the configuration is very similar to the S3 endpoint.
@ -140,6 +140,143 @@ server {
@TODO
## Traefik
## Traefik v2
@TODO
We will see in this part how to set up a reverse proxy with [Traefik](https://docs.traefik.io/).
Here is [a basic configuration file](https://doc.traefik.io/traefik/https/acme/#configuration-examples):
```toml
[entryPoints]
[entryPoints.web]
address = ":80"
[entryPoints.websecure]
address = ":443"
[certificatesResolvers.myresolver.acme]
email = "your-email@example.com"
storage = "acme.json"
[certificatesResolvers.myresolver.acme.httpChallenge]
# used during the challenge
entryPoint = "web"
```
### Add Garage service
To add Garage on Traefik you should declare a new service using its IP address (or hostname) and port:
```toml
[http.services]
[http.services.my_garage_service.loadBalancer]
[[http.services.my_garage_service.loadBalancer.servers]]
url = "http://xxx.xxx.xxx.xxx"
port = 3900
```
It's possible to declare multiple Garage servers as back-ends:
```toml
[http.services]
[[http.services.my_garage_service.loadBalancer.servers]]
url = "http://xxx.xxx.xxx.xxx"
port = 3900
[[http.services.my_garage_service.loadBalancer.servers]]
url = "http://yyy.yyy.yyy.yyy"
port = 3900
[[http.services.my_garage_service.loadBalancer.servers]]
url = "http://zzz.zzz.zzz.zzz"
port = 3900
```
Traefik can remove unhealthy servers automatically with [a health check configuration](https://doc.traefik.io/traefik/routing/services/#health-check):
```
[http.services]
[http.services.my_garage_service.loadBalancer]
[http.services.my_garage_service.loadBalancer.healthCheck]
path = "/"
interval = "60s"
timeout = "5s"
```
### Adding a website
To add a new website, add the following declaration to your Traefik configuration file:
```toml
[http.routers]
[http.routers.my_website]
rule = "Host(`yoururl.example.org`)"
service = "my_garage_service"
entryPoints = ["web"]
```
Enable HTTPS access to your website with the following configuration section ([documentation](https://doc.traefik.io/traefik/https/overview/)):
```toml
...
entryPoints = ["websecure"]
[http.routers.my_website.tls]
certResolver = "myresolver"
...
```
### Adding gzip compression
Add the following configuration section [to compress response](https://doc.traefik.io/traefik/middlewares/http/compress/) using [gzip](https://developer.mozilla.org/en-US/docs/Glossary/GZip_compression) before sending them to the client:
```toml
[http.routers]
[http.routers.my_website]
...
middlewares = ["gzip_compress"]
...
[http.middlewares]
[http.middlewares.gzip_compress.compress]
```
### Add caching response
Traefik's caching middleware is only available on [entreprise version](https://doc.traefik.io/traefik-enterprise/middlewares/http-cache/), however the freely-available [Souin plugin](https://github.com/darkweak/souin#tr%C3%A6fik-container) can also do the job. (section to be completed)
### Complete example
```toml
[entryPoints]
[entryPoints.web]
address = ":80"
[entryPoints.websecure]
address = ":443"
[certificatesResolvers.myresolver.acme]
email = "your-email@example.com"
storage = "acme.json"
[certificatesResolvers.myresolver.acme.httpChallenge]
# used during the challenge
entryPoint = "web"
[http.routers]
[http.routers.my_website]
rule = "Host(`yoururl.example.org`)"
service = "my_garage_service"
middlewares = ["gzip_compress"]
entryPoints = ["websecure"]
[http.services]
[http.services.my_garage_service.loadBalancer]
[http.services.my_garage_service.loadBalancer.healthCheck]
path = "/"
interval = "60s"
timeout = "5s"
[[http.services.my_garage_service.loadBalancer.servers]]
url = "http://xxx.xxx.xxx.xxx"
[[http.services.my_garage_service.loadBalancer.servers]]
url = "http://yyy.yyy.yyy.yyy"
[[http.services.my_garage_service.loadBalancer.servers]]
url = "http://zzz.zzz.zzz.zzz"
[http.middlewares]
[http.middlewares.gzip_compress.compress]
```

View File

@ -249,16 +249,6 @@ mc alias set \
--api S3v4
```
You must also add an environment variable to your configuration to
inform MinIO of our region (`garage` by default, corresponding to the `s3_region` parameter
in the configuration file).
The best way is to add the following snippet to your `$HOME/.bash_profile`
or `$HOME/.bashrc` file:
```bash
export MC_REGION=garage
```
### Use `mc`
You can not list buckets from `mc` currently.

View File

@ -1,18 +1,41 @@
# Specification of Garage's administration API
+++
title = "Administration API"
weight = 16
+++
The Garage administration API is accessible through a dedicated server whose
listen address is specified in the `[admin]` section of the configuration
file (see [configuration file
reference](@/documentation/reference-manual/configuration.md))
**WARNING.** At this point, there is no comittement to stability of the APIs described in this document.
We will bump the version numbers prefixed to each API endpoint at each time the syntax
or semantics change, meaning that code that relies on these endpoint will break
when changes are introduced.
The Garage administration API was introduced in version 0.7.2, this document
does not apply to older versions of Garage.
## Access control
The admin API uses two different tokens for acces control, that are specified in the config file's `[admin]` section:
- `metrics_token`: the token for accessing the Metrics endpoint (if this token is not set in the config file, the Metrics endpoint can be accessed without access control);
- `admin_token`: the token for accessing all of the other administration endpoints (if this token is not set in the config file, access to these endpoints is disabled entirely).
- `metrics_token`: the token for accessing the Metrics endpoint (if this token
is not set in the config file, the Metrics endpoint can be accessed without
access control);
- `admin_token`: the token for accessing all of the other administration
endpoints (if this token is not set in the config file, access to these
endpoints is disabled entirely).
These tokens are used as simple HTTP bearer tokens. In other words, to
authenticate access to an admin API endpoint, add the following HTTP header
to your request:
```
Authorization: Bearer <token>
```
## Administration API endpoints
@ -111,8 +134,8 @@ Example request body:
```json
[
"ec79480e0ce52ae26fd00c9da684e4fa56658d9c64cdcecb094e936de0bfe71f@10.0.0.11:3901",
"4a6ae5a1d0d33bf895f5bb4f0a418b7dc94c47c0dd2eb108d1158f3c8f60b0ff@10.0.0.12:3901"
"ec79480e0ce52ae26fd00c9da684e4fa56658d9c64cdcecb094e936de0bfe71f@10.0.0.11:3901",
"4a6ae5a1d0d33bf895f5bb4f0a418b7dc94c47c0dd2eb108d1158f3c8f60b0ff@10.0.0.12:3901"
]
```
@ -122,14 +145,14 @@ Example response:
```json
[
{
"success": true,
"error": null
},
{
"success": false,
"error": "Handshake error"
}
{
"success": true,
"error": null
},
{
"success": false,
"error": "Handshake error"
}
]
```
@ -278,7 +301,7 @@ Request body format:
```json
{
"name": "NameOfMyKey"
"name": "NameOfMyKey"
}
```
@ -290,9 +313,9 @@ Request body format:
```json
{
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"secretAccessKey": "b892c0665f0ada8a4755dae98baa3b133590e11dae3bcc1f9d769d67f16c3835",
"name": "NameOfMyKey"
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"secretAccessKey": "b892c0665f0ada8a4755dae98baa3b133590e11dae3bcc1f9d769d67f16c3835",
"name": "NameOfMyKey"
}
```
@ -380,11 +403,11 @@ Request body format:
```json
{
"name": "NameOfMyKey",
"allow": {
"createBucket": true,
},
"deny": {}
"name": "NameOfMyKey",
"allow": {
"createBucket": true,
},
"deny": {}
}
```
@ -450,24 +473,31 @@ Example response:
```json
{
"id": "e6a14cd6a27f48684579ec6b381c078ab11697e6bc8513b72b2f5307e25fff9b",
"globalAliases": [
"alex"
],
"keys": [
{
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"name": "alex",
"permissions": {
"read": true,
"write": true,
"owner": true
},
"bucketLocalAliases": [
"test"
]
}
]
"id": "afa8f0a22b40b1247ccd0affb869b0af5cff980924a20e4b5e0720a44deb8d39",
"globalAliases": [],
"websiteAccess": false,
"websiteConfig": null,
"keys": [
{
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"name": "Imported key",
"permissions": {
"read": true,
"write": true,
"owner": true
},
"bucketLocalAliases": [
"debug"
]
}
],
"objects": 14827,
"bytes": 13189855625,
"unfinshedUploads": 0,
"quotas": {
"maxSize": null,
"maxObjects": null
}
}
```
@ -479,7 +509,7 @@ Request body format:
```json
{
"globalAlias": "NameOfMyBucket"
"globalAlias": "NameOfMyBucket"
}
```
@ -487,15 +517,15 @@ OR
```json
{
"localAlias": {
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"alias": "NameOfMyBucket",
"allow": {
"read": true,
"write": true,
"owner": false
}
}
"localAlias": {
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"alias": "NameOfMyBucket",
"allow": {
"read": true,
"write": true,
"owner": false
}
}
}
```
@ -517,26 +547,37 @@ Deletes a storage bucket. A bucket cannot be deleted if it is not empty.
Warning: this will delete all aliases associated with the bucket!
#### PutBucketWebsite `PUT /v0/bucket/website?id=<bucket id>`
#### UpdateBucket `PUT /v0/bucket?id=<bucket id>`
Sets the website configuration for a bucket (this also enables website access for this bucket).
Updates configuration of the given bucket.
Request body format:
```json
{
"indexDocument": "index.html",
"errorDocument": "404.html"
"websiteAccess": {
"enabled": true,
"indexDocument": "index.html",
"errorDocument": "404.html"
},
"quotas": {
"maxSize": 19029801,
"maxObjects": null,
}
}
```
The field `errorDocument` is optional, if no error document is set a generic error message is displayed when errors happen.
All fields (`websiteAccess` and `quotas`) are optionnal.
If they are present, the corresponding modifications are applied to the bucket, otherwise nothing is changed.
In `websiteAccess`: if `enabled` is `true`, `indexDocument` must be specified.
The field `errorDocument` is optional, if no error document is set a generic
error message is displayed when errors happen. Conversely, if `enabled` is
`false`, neither `indexDocument` nor `errorDocument` must be specified.
#### DeleteBucketWebsite `DELETE /v0/bucket/website?id=<bucket id>`
Deletes the website configuration for a bucket (disables website access for this bucket).
In `quotas`: new values of `maxSize` and `maxObjects` must both be specified, or set to `null`
to remove the quotas. An absent value will be considered the same as a `null`. It is not possible
to change only one of the two quotas.
### Operations on permissions for keys on buckets
@ -548,13 +589,13 @@ Request body format:
```json
{
"bucketId": "e6a14cd6a27f48684579ec6b381c078ab11697e6bc8513b72b2f5307e25fff9b",
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"permissions": {
"read": true,
"write": true,
"owner": true
},
"bucketId": "e6a14cd6a27f48684579ec6b381c078ab11697e6bc8513b72b2f5307e25fff9b",
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"permissions": {
"read": true,
"write": true,
"owner": true
},
}
```
@ -569,13 +610,13 @@ Request body format:
```json
{
"bucketId": "e6a14cd6a27f48684579ec6b381c078ab11697e6bc8513b72b2f5307e25fff9b",
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"permissions": {
"read": false,
"write": false,
"owner": true
},
"bucketId": "e6a14cd6a27f48684579ec6b381c078ab11697e6bc8513b72b2f5307e25fff9b",
"accessKeyId": "GK31c2f218a2e44f485b94239e",
"permissions": {
"read": false,
"write": false,
"owner": true
},
}
```

View File

@ -10,6 +10,7 @@ metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
block_size = 1048576
block_manager_background_tranquility = 2
replication_mode = "3"
@ -47,6 +48,8 @@ root_domain = ".web.garage"
[admin]
api_bind_addr = "0.0.0.0:3903"
metrics_token = "cacce0b2de4bc2d9f5b5fdff551e01ac1496055aed248202d415398987e35f81"
admin_token = "ae8cb40ea7368bbdbb6430af11cca7da833d3458a5f52086f4e805a570fb5c2a"
trace_sink = "http://localhost:4317"
```
@ -84,6 +87,17 @@ files will remain available. This however means that chunks from existing files
will not be deduplicated with chunks from newly uploaded files, meaning you
might use more storage space that is optimally possible.
### `block_manager_background_tranquility`
This parameter tunes the activity of the background worker responsible for
resyncing data blocks between nodes. The higher the tranquility value is set,
the more the background worker will wait between iterations, meaning the load
on the system (including network usage between nodes) will be reduced. The
minimal value for this parameter is `0`, where the background worker will
allways work at maximal throughput to resynchronize blocks. The default value
is `2`, where the background worker will try to spend at most 1/3 of its time
working, and 2/3 sleeping in order to reduce system load.
### `replication_mode`
Garage supports the following replication modes:
@ -326,10 +340,24 @@ Garage has a few administration capabilities, in particular to allow remote moni
### `api_bind_addr`
If specified, Garage will bind an HTTP server to this port and address, on
which it will listen to requests for administration features. Currently,
this endpoint only exposes Garage metrics in the Prometheus format at
`/metrics`. This endpoint is not authenticated. In the future, bucket and
access key management might be possible by REST calls to this endpoint.
which it will listen to requests for administration features.
See [administration API reference](@/documentation/reference-manual/admin-api.md) to learn more about these features.
### `metrics_token` (since version 0.7.2)
The token for accessing the Metrics endpoint. If this token is not set in
the config file, the Metrics endpoint can be accessed without access
control.
You can use any random string for this value. We recommend generating a random token with `openssl rand -hex 32`.
### `admin_token` (since version 0.7.2)
The token for accessing all of the other administration endpoints. If this
token is not set in the config file, access to these endpoints is disabled
entirely.
You can use any random string for this value. We recommend generating a random token with `openssl rand -hex 32`.
### `trace_sink`

View File

@ -0,0 +1,58 @@
+++
title = "K2V"
weight = 30
+++
Starting with version 0.7.2, Garage introduces an optionnal feature, K2V,
which is an alternative storage API designed to help efficiently store
many small values in buckets (in opposition to S3 which is more designed
to store large blobs).
K2V is currently disabled at compile time in all builds, as the
specification is still subject to changes. To build a Garage version with
K2V, the Cargo feature flag `k2v` must be activated. Special builds with
the `k2v` feature flag enabled can be obtained from our download page under
"Extra builds": such builds can be identified easily as their tag name ends
with `-k2v` (example: `v0.7.2-k2v`).
The specification of the K2V API can be found
[here](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md).
This document also includes a high-level overview of K2V's design.
The K2V API uses AWSv4 signatures for authentification, same as the S3 API.
The AWS region used for signature calculation is always the same as the one
defined for the S3 API in the config file.
## Enabling and using K2V
To enable K2V, download and run a build that has the `k2v` feature flag
enabled, or produce one yourself. Then, add the following section to your
configuration file:
```toml
[k2v_api]
api_bind_addr = "<ip>:<port>"
```
Please select a port number that is not already in use by another API
endpoint (S3 api, admin API) or by the RPC server.
We provide an early-stage K2V client library for Rust which can be imported by adding the following to your `Cargo.toml` file:
```toml
k2v-client = { git = "https://git.deuxfleurs.fr/Deuxfleurs/garage.git" }
```
There is also a simple CLI utility which can be built from source in the
following way:
```sh
git clone https://git.deuxfleurs.fr/Deuxfleurs/garage.git
cd garage/src/k2v-client
cargo build --features cli --bin k2v-cli
```
The CLI utility is self-documented, run `k2v-cli --help` to learn how to use
it. There is also a short README.md in the `src/k2v-client` folder with some
instructions.

View File

@ -3,51 +3,77 @@ title = "S3 Compatibility status"
weight = 20
+++
## Endpoint implementation
## DISCLAIMER
All APIs that are missing on Garage will return a 501 Not Implemented.
Some `x-amz-` headers are not implemented.
**The compatibility list for other platforms is given only for informational
purposes and based on available documentation.** They are sometimes completed,
in a best effort approach, with the source code and inputs from maintainers
when documentation is lacking. We are not proactively monitoring new versions
of each software: check the modification history to know when the page has been
updated for the last time. Some entries will be inexact or outdated. For any
serious decision, you must make your own tests.
**The official documentation of each project can be accessed by clicking on the
project name in the column header.**
*The compatibility list for other platforms is given only for information purposes and based on available documentation. Some entries might be inexact. Feel free to open a PR to fix this table. Minio is missing because they do not provide a public S3 compatibility list.*
Feel free to open a PR to suggest fixes this table. Minio is missing because they do not provide a public S3 compatibility list.
### Features
## Update history
- 2022-02-07 - First version of this page
- 2022-05-25 - Many Ceph S3 endpoints are not documented but implemented. Following a notification from the Ceph community, we added them.
## High-level features
| Feature | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [signature v2](https://docs.aws.amazon.com/general/latest/gr/signature-version-2.html) (deprecated) | ❌ Missing | ✅ | ❌ | ✅ | ✅ |
| [signature v2](https://docs.aws.amazon.com/general/latest/gr/signature-version-2.html) (deprecated) | ❌ Missing | ✅ | | ✅ | ✅ |
| [signature v4](https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html) | ✅ Implemented | ✅ | ✅ | ❌ | ✅ |
| [URL path-style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access) (eg. `host.tld/bucket/key`) | ✅ Implemented | ✅ | ✅ | ❓| ✅ |
| [URL vhost-style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#virtual-hosted-style-access) URL (eg. `bucket.host.tld/key`) | ✅ Implemented | ❌| ✅| ✅ | ✅ |
| [Presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html) | ✅ Implemented | ❌| ✅ | ✅ | ✅(❓) |
*Note:* OpenIO does not says if it supports presigned URLs. Because it is part of signature v4 and they claim they support it without additional precisions, we suppose that OpenIO supports presigned URLs.
*Note:* OpenIO does not says if it supports presigned URLs. Because it is part
of signature v4 and they claim they support it without additional precisions,
we suppose that OpenIO supports presigned URLs.
## Endpoint implementation
All endpoints that are missing on Garage will return a 501 Not Implemented.
Some `x-amz-` headers are not implemented.
### Core endoints
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [CreateBucket](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateBucket.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [DeleteBucket](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucket.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [GetBucketLocation](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLocation.html) | ✅ Implemented | ✅ | ✅ | ❌ | ✅ |
| [HeadBucket](https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadBucket.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [ListBuckets](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListBuckets.html) | ✅ Implemented | ❌| ✅ | ✅ | ✅ |
| [HeadObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [HeadObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_HeadObject.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [CopyObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [DeleteObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObject.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [DeleteObjects](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [GetObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [ListObjects](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html) | ✅ Implemented (see details below) | ✅ | ✅ | ✅ | ❌|
| [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) | ✅ Implemented | ❌| | ❌| ✅ |
| [PostObject](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html) (compatibility API) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) | ✅ Implemented | ❌| | ❌| ✅ |
| [PostObject](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html) | ✅ Implemented | ❌| ✅ | ❌| ❌|
| [PutObject](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
**ListObjects:** Implemented, but there isn't a very good specification of what `encoding-type=url` covers so there might be some encoding bugs. In our implementation the url-encoded fields are in the same in ListObjects as they are in ListObjectsV2.
**ListObjects:** Implemented, but there isn't a very good specification of what
`encoding-type=url` covers so there might be some encoding bugs. In our
implementation the url-encoded fields are in the same in ListObjects as they
are in ListObjectsV2.
*Note: Ceph API documentation is incomplete and miss at least HeadBucket and UploadPartCopy, but these endpoints are documented in [Red Hat Ceph Storage - Chapter 2. Ceph Object Gateway and the S3 API](https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/developer_guide/ceph-object-gateway-and-the-s3-api)*
*Note: Ceph API documentation is incomplete and lacks at least HeadBucket and UploadPartCopy,
but these endpoints are documented in [Red Hat Ceph Storage - Chapter 2. Ceph Object Gateway and the S3 API](https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/developer_guide/ceph-object-gateway-and-the-s3-api)*
### Multipart Upload endpoints
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [AbortMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_AbortMultipartUpload.html) | ✅ Implemented | ✅ | ✅ | ✅ | ✅ |
| [CompleteMultipartUpload](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html) | ✅ Implemented (see details below) | ✅ | ✅ | ✅ | ✅ |
@ -62,18 +88,18 @@ For more information, please refer to our [issue tracker](https://git.deuxfleurs
### Website endpoints
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [DeleteBucketWebsite](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketWebsite.html) | ✅ Implemented | ❌| ❌| ❌| ❌|
| [GetBucketWebsite](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketWebsite.html) | ✅ Implemented | ❌ | ❌| ❌| ❌|
| [PutBucketWebsite](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html) | ⚠ Partially implemented (see below)| ❌| ❌| ❌| ❌|
| [DeleteBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketCors.html) | ✅ Implemented | ❌| | ❌| ✅ |
| [GetBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketCors.html) | ✅ Implemented | ❌ | | ❌| ✅ |
| [PutBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketCors.html) | ✅ Implemented | ❌| | ❌| ✅ |
| [DeleteBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketCors.html) | ✅ Implemented | ❌| | ❌| ✅ |
| [GetBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketCors.html) | ✅ Implemented | ❌ | | ❌| ✅ |
| [PutBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketCors.html) | ✅ Implemented | ❌| | ❌| ✅ |
**PutBucketWebsite:** Implemented, but only stores the index document suffix and the error document path. Redirects are not supported.
*Note: Ceph radosgw has some support for static websites but it is different from Amazon one plus it does not implement its configuration endpoints.*
*Note: Ceph radosgw has some support for static websites but it is different from the Amazon one. It also does not implement its configuration endpoints.*
### ACL, Policies endpoints
@ -81,29 +107,29 @@ Amazon has 2 access control mechanisms in S3: ACL (legacy) and policies (new one
Garage implements none of them, and has its own system instead, built around a per-access-key-per-bucket logic.
See Garage CLI reference manual to learn how to use Garage's permission system.
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [DeleteBucketPolicy](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketPolicy.html) | ❌ Missing | ❌| | ✅ | ❌|
| [GetBucketPolicy](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketPolicy.html) | ❌ Missing | ❌| | ⚠ | ❌|
| [GetBucketPolicyStatus](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketPolicyStatus.html) | ❌ Missing | ❌| | ❌| ❌|
| [PutBucketPolicy](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketPolicy.html) | ❌ Missing | ❌| | ⚠ | ❌|
| [DeleteBucketPolicy](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketPolicy.html) | ❌ Missing | ❌| | ✅ | ❌|
| [GetBucketPolicy](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketPolicy.html) | ❌ Missing | ❌| | ⚠ | ❌|
| [GetBucketPolicyStatus](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketPolicyStatus.html) | ❌ Missing | ❌| | ❌| ❌|
| [PutBucketPolicy](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketPolicy.html) | ❌ Missing | ❌| | ⚠ | ❌|
| [GetBucketAcl](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketAcl.html) | ❌ Missing | ✅ | ✅ | ✅ | ✅ |
| [PutBucketAcl](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketAcl.html) | ❌ Missing | ✅ | ✅ | ✅ | ✅ |
| [GetObjectAcl](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectAcl.html) | ❌ Missing | ✅ | ✅ | ✅ | ✅ |
| [PutObjectAcl](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectAcl.html) | ❌ Missing | ✅ | ✅ | ✅ | ✅ |
*Notes:* Ceph claims that it supports bucket policies but does not implement any Policy endpoints. They probably refer to their own permission system. Riak CS only supports a subset of the policy configuration.
*Notes:* Riak CS only supports a subset of the policy configuration.
### Versioning, Lifecycle endpoints
Garage does not support (yet) object versioning.
Garage does not (yet) support object versioning.
If you need this feature, please [share your use case in our dedicated issue](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/166).
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [DeleteBucketLifecycle](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketLifecycle.html) | ❌ Missing | ❌| ✅| ❌| ✅|
| [GetBucketLifecycleConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLifecycleConfiguration.html) | ❌ Missing | ❌| | ❌| ✅|
| [PutBucketLifecycleConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleConfiguration.html) | ❌ Missing | ❌| | ❌| ✅|
| [GetBucketLifecycleConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketLifecycleConfiguration.html) | ❌ Missing | ❌| | ❌| ✅|
| [PutBucketLifecycleConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketLifecycleConfiguration.html) | ❌ Missing | ❌| | ❌| ✅|
| [GetBucketVersioning](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketVersioning.html) | ❌ Stub (see below) | ✅| ✅ | ❌| ✅|
| [ListObjectVersions](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html) | ❌ Missing | ❌| ✅ | ❌| ✅|
| [PutBucketVersioning](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketVersioning.html) | ❌ Missing | ❌| ✅| ❌| ✅|
@ -111,64 +137,65 @@ If you need this feature, please [share your use case in our dedicated issue](ht
**GetBucketVersioning:** Stub implementation (Garage does not yet support versionning so this always returns "versionning not enabled").
*Note: Ceph only supports `Expiration`, `NoncurrentVersionExpiration` and `AbortIncompleteMultipartUpload` on its Lifecycle endpoints.*
### Replication endpoints
Please open an issue if you have a use case for replication.
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [DeleteBucketReplication](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketReplication.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [GetBucketReplication](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketReplication.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [PutBucketReplication](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketReplication.html) | ❌ Missing | ❌| ⚠ | ❌| ❌|
*Note: Ceph documentation briefly says that Ceph supports [replication though the S3 API](https://docs.ceph.com/en/latest/radosgw/multisite-sync-policy/#s3-replication-api) but with some limitations. Additionaly, replication endpoints are not documented in the S3 compatibility page so I don't know what kind of support we can expect.*
*Note: Ceph documentation briefly says that Ceph supports
[replication though the S3 API](https://docs.ceph.com/en/latest/radosgw/multisite-sync-policy/#s3-replication-api)
but with some limitations.
Additionaly, replication endpoints are not documented in the S3 compatibility page so I don't know what kind of support we can expect.*
### Locking objects
Amazon defines a concept of [object locking](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html) that can be achieved either through a Retention period or a Legal hold.
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [GetObjectLegalHold](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectLegalHold.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [PutObjectLegalHold](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectLegalHold.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [GetObjectRetention](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectRetention.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [PutObjectRetention](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectRetention.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [GetObjectLockConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectLockConfiguration.html) | ❌ Missing | ❌| | ❌| ❌|
| [PutObjectLockConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectLockConfiguration.html) | ❌ Missing | ❌| | ❌| ❌|
| [GetObjectLockConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectLockConfiguration.html) | ❌ Missing | ❌| | ❌| ❌|
| [PutObjectLockConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectLockConfiguration.html) | ❌ Missing | ❌| | ❌| ❌|
### (Server-side) encryption
We think that you can either encrypt your server partition or do client-side encryption, so we did not implement server-side encryption for Garage.
Please open an issue if you have a use case.
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [DeleteBucketEncryption](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketEncryption.html) | ❌ Missing | ❌| | ❌| ❌|
| [GetBucketEncryption](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketEncryption.html) | ❌ Missing | ❌| | ❌| ❌|
| [PutBucketEncryption](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketEncryption.html) | ❌ Missing | ❌| | ❌| ❌|
| [DeleteBucketEncryption](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketEncryption.html) | ❌ Missing | ❌| | ❌| ❌|
| [GetBucketEncryption](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketEncryption.html) | ❌ Missing | ❌| | ❌| ❌|
| [PutBucketEncryption](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketEncryption.html) | ❌ Missing | ❌| | ❌| ❌|
### Misc endpoints
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [GetBucketNotificationConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketNotificationConfiguration.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [PutBucketNotificationConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketNotificationConfiguration.html) | ❌ Missing | ❌| ✅ | ❌| ❌|
| [DeleteBucketTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [GetBucketTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [PutBucketTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [DeleteObjectTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjectTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [GetObjectTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [PutObjectTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [GetObjectTorrent](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectTorrent.html) | ❌ Missing | ❌| | ❌| ❌|
| [DeleteBucketTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [GetBucketTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [PutBucketTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [DeleteObjectTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjectTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [GetObjectTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [PutObjectTagging](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectTagging.html) | ❌ Missing | ❌| | ❌| ✅ |
| [GetObjectTorrent](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjectTorrent.html) | ❌ Missing | ❌| | ❌| ❌|
### Vendor specific endpoints
<details><summary>Display Amazon specifc endpoints</summary>
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
| Endpoint | Garage | [Openstack Swift](https://docs.openstack.org/swift/latest/s3_compat.html) | [Ceph Object Gateway](https://docs.ceph.com/en/latest/radosgw/s3/) | [Riak CS](https://docs.riak.com/riak/cs/2.1.1/references/apis/storage/s3/index.html) | [OpenIO](https://docs.openio.io/latest/source/arch-design/s3_compliancy.html) |
|------------------------------|----------------------------------|-----------------|---------------|---------|-----|
| [DeleteBucketAnalyticsConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketAnalyticsConfiguration.html) | ❌ Missing | ❌| ❌| ❌| ❌|
| [DeleteBucketIntelligentTieringConfiguration](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketIntelligentTieringConfiguration.html) | ❌ Missing | ❌| ❌| ❌| ❌|

View File

@ -156,12 +156,7 @@ impl ApiHandler for AdminApiServer {
}
Endpoint::CreateBucket => handle_create_bucket(&self.garage, req).await,
Endpoint::DeleteBucket { id } => handle_delete_bucket(&self.garage, id).await,
Endpoint::PutBucketWebsite { id } => {
handle_put_bucket_website(&self.garage, id, req).await
}
Endpoint::DeleteBucketWebsite { id } => {
handle_delete_bucket_website(&self.garage, id).await
}
Endpoint::UpdateBucket { id } => handle_update_bucket(&self.garage, id, req).await,
// Bucket-key permissions
Endpoint::BucketAllowKey => {
handle_bucket_change_key_perm(&self.garage, req, true).await

View File

@ -6,7 +6,6 @@ use serde::{Deserialize, Serialize};
use garage_util::crdt::*;
use garage_util::data::*;
use garage_util::error::Error as GarageError;
use garage_util::time::*;
use garage_table::*;
@ -15,11 +14,12 @@ use garage_model::bucket_alias_table::*;
use garage_model::bucket_table::*;
use garage_model::garage::Garage;
use garage_model::permission::*;
use garage_model::s3::object_table::*;
use crate::admin::error::*;
use crate::admin::key::ApiBucketKeyPerm;
use crate::common_error::CommonError;
use crate::helpers::parse_json_body;
use crate::helpers::{json_ok_response, parse_json_body};
pub async fn handle_list_buckets(garage: &Arc<Garage>) -> Result<Response<Body>, Error> {
let buckets = garage
@ -60,10 +60,7 @@ pub async fn handle_list_buckets(garage: &Arc<Garage>) -> Result<Response<Body>,
})
.collect::<Vec<_>>();
let resp_json = serde_json::to_string_pretty(&res).map_err(GarageError::from)?;
Ok(Response::builder()
.status(StatusCode::OK)
.body(Body::from(resp_json))?)
Ok(json_ok_response(&res)?)
}
#[derive(Serialize)]
@ -81,6 +78,13 @@ struct BucketLocalAlias {
alias: String,
}
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
struct ApiBucketQuotas {
max_size: Option<u64>,
max_objects: Option<u64>,
}
pub async fn handle_get_bucket_info(
garage: &Arc<Garage>,
id: Option<String>,
@ -112,6 +116,14 @@ async fn bucket_info_results(
.get_existing_bucket(bucket_id)
.await?;
let counters = garage
.object_counter_table
.table
.get(&bucket_id, &EmptyKey)
.await?
.map(|x| x.filtered_values(&garage.system.ring.borrow()))
.unwrap_or_default();
let mut relevant_keys = HashMap::new();
for (k, _) in bucket
.state
@ -152,6 +164,7 @@ async fn bucket_info_results(
let state = bucket.state.as_option().unwrap();
let quotas = state.quotas.get();
let res =
GetBucketInfoResult {
id: hex::encode(&bucket.id),
@ -195,12 +208,19 @@ async fn bucket_info_results(
}
})
.collect::<Vec<_>>(),
objects: counters.get(OBJECTS).cloned().unwrap_or_default(),
bytes: counters.get(BYTES).cloned().unwrap_or_default(),
unfinshed_uploads: counters
.get(UNFINISHED_UPLOADS)
.cloned()
.unwrap_or_default(),
quotas: ApiBucketQuotas {
max_size: quotas.max_size,
max_objects: quotas.max_objects,
},
};
let resp_json = serde_json::to_string_pretty(&res).map_err(GarageError::from)?;
Ok(Response::builder()
.status(StatusCode::OK)
.body(Body::from(resp_json))?)
Ok(json_ok_response(&res)?)
}
#[derive(Serialize)]
@ -212,6 +232,10 @@ struct GetBucketInfoResult {
#[serde(default)]
website_config: Option<GetBucketInfoWebsiteResult>,
keys: Vec<GetBucketInfoKey>,
objects: i64,
bytes: i64,
unfinshed_uploads: i64,
quotas: ApiBucketQuotas,
}
#[derive(Serialize)]
@ -370,14 +394,12 @@ pub async fn handle_delete_bucket(
.body(Body::empty())?)
}
// ---- BUCKET WEBSITE CONFIGURATION ----
pub async fn handle_put_bucket_website(
pub async fn handle_update_bucket(
garage: &Arc<Garage>,
id: String,
req: Request<Body>,
) -> Result<Response<Body>, Error> {
let req = parse_json_body::<PutBucketWebsiteRequest>(req).await?;
let req = parse_json_body::<UpdateBucketRequest>(req).await?;
let bucket_id = parse_bucket_id(&id)?;
let mut bucket = garage
@ -386,10 +408,31 @@ pub async fn handle_put_bucket_website(
.await?;
let state = bucket.state.as_option_mut().unwrap();
state.website_config.update(Some(WebsiteConfig {
index_document: req.index_document,
error_document: req.error_document,
}));
if let Some(wa) = req.website_access {
if wa.enabled {
state.website_config.update(Some(WebsiteConfig {
index_document: wa.index_document.ok_or_bad_request(
"Please specify indexDocument when enabling website access.",
)?,
error_document: wa.error_document,
}));
} else {
if wa.index_document.is_some() || wa.error_document.is_some() {
return Err(Error::bad_request(
"Cannot specify indexDocument or errorDocument when disabling website access.",
));
}
state.website_config.update(None);
}
}
if let Some(q) = req.quotas {
state.quotas.update(BucketQuotas {
max_size: q.max_size,
max_objects: q.max_objects,
});
}
garage.bucket_table.insert(&bucket).await?;
@ -398,29 +441,17 @@ pub async fn handle_put_bucket_website(
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct PutBucketWebsiteRequest {
index_document: String,
#[serde(default)]
error_document: Option<String>,
struct UpdateBucketRequest {
website_access: Option<UpdateBucketWebsiteAccess>,
quotas: Option<ApiBucketQuotas>,
}
pub async fn handle_delete_bucket_website(
garage: &Arc<Garage>,
id: String,
) -> Result<Response<Body>, Error> {
let bucket_id = parse_bucket_id(&id)?;
let mut bucket = garage
.bucket_helper()
.get_existing_bucket(bucket_id)
.await?;
let state = bucket.state.as_option_mut().unwrap();
state.website_config.update(None);
garage.bucket_table.insert(&bucket).await?;
bucket_info_results(garage, bucket_id).await
#[derive(Deserialize)]
#[serde(rename_all = "camelCase")]
struct UpdateBucketWebsiteAccess {
enabled: bool,
index_document: Option<String>,
error_document: Option<String>,
}
// ---- BUCKET/KEY PERMISSIONS ----

View File

@ -7,19 +7,19 @@ use serde::{Deserialize, Serialize};
use garage_util::crdt::*;
use garage_util::data::*;
use garage_util::error::Error as GarageError;
use garage_rpc::layout::*;
use garage_model::garage::Garage;
use crate::admin::error::*;
use crate::helpers::parse_json_body;
use crate::helpers::{json_ok_response, parse_json_body};
pub async fn handle_get_cluster_status(garage: &Arc<Garage>) -> Result<Response<Body>, Error> {
let res = GetClusterStatusResponse {
node: hex::encode(garage.system.id),
garage_version: garage.system.garage_version(),
db_engine: garage.db.engine(),
known_nodes: garage
.system
.get_known_nodes()
@ -39,10 +39,7 @@ pub async fn handle_get_cluster_status(garage: &Arc<Garage>) -> Result<Response<
layout: get_cluster_layout(garage),
};
let resp_json = serde_json::to_string_pretty(&res).map_err(GarageError::from)?;
Ok(Response::builder()
.status(StatusCode::OK)
.body(Body::from(resp_json))?)
Ok(json_ok_response(&res)?)
}
pub async fn handle_connect_cluster_nodes(
@ -66,18 +63,13 @@ pub async fn handle_connect_cluster_nodes(
})
.collect::<Vec<_>>();
let resp_json = serde_json::to_string_pretty(&res).map_err(GarageError::from)?;
Ok(Response::builder()
.status(StatusCode::OK)
.body(Body::from(resp_json))?)
Ok(json_ok_response(&res)?)
}
pub async fn handle_get_cluster_layout(garage: &Arc<Garage>) -> Result<Response<Body>, Error> {
let res = get_cluster_layout(garage);
let resp_json = serde_json::to_string_pretty(&res).map_err(GarageError::from)?;
Ok(Response::builder()
.status(StatusCode::OK)
.body(Body::from(resp_json))?)
Ok(json_ok_response(&res)?)
}
fn get_cluster_layout(garage: &Arc<Garage>) -> GetClusterLayoutResponse {
@ -107,6 +99,7 @@ fn get_cluster_layout(garage: &Arc<Garage>) -> GetClusterLayoutResponse {
struct GetClusterStatusResponse {
node: String,
garage_version: &'static str,
db_engine: String,
known_nodes: HashMap<String, KnownNodeResp>,
layout: GetClusterLayoutResponse,
}

View File

@ -72,8 +72,9 @@ impl ApiError for Error {
}
}
fn add_http_headers(&self, _header_map: &mut HeaderMap<HeaderValue>) {
// nothing
fn add_http_headers(&self, header_map: &mut HeaderMap<HeaderValue>) {
use hyper::header;
header_map.append(header::CONTENT_TYPE, "application/json".parse().unwrap());
}
fn http_body(&self, garage_region: &str, path: &str) -> Body {

View File

@ -4,15 +4,13 @@ use std::sync::Arc;
use hyper::{Body, Request, Response, StatusCode};
use serde::{Deserialize, Serialize};
use garage_util::error::Error as GarageError;
use garage_table::*;
use garage_model::garage::Garage;
use garage_model::key_table::*;
use crate::admin::error::*;
use crate::helpers::parse_json_body;
use crate::helpers::{json_ok_response, parse_json_body};
pub async fn handle_list_keys(garage: &Arc<Garage>) -> Result<Response<Body>, Error> {
let res = garage
@ -32,10 +30,7 @@ pub async fn handle_list_keys(garage: &Arc<Garage>) -> Result<Response<Body>, Er
})
.collect::<Vec<_>>();
let resp_json = serde_json::to_string_pretty(&res).map_err(GarageError::from)?;
Ok(Response::builder()
.status(StatusCode::OK)
.body(Body::from(resp_json))?)
Ok(json_ok_response(&res)?)
}
#[derive(Serialize)]
@ -221,10 +216,7 @@ async fn key_info_results(garage: &Arc<Garage>, key: Key) -> Result<Response<Bod
.collect::<Vec<_>>(),
};
let resp_json = serde_json::to_string_pretty(&res).map_err(GarageError::from)?;
Ok(Response::builder()
.status(StatusCode::OK)
.body(Body::from(resp_json))?)
Ok(json_ok_response(&res)?)
}
#[derive(Serialize)]

View File

@ -48,10 +48,7 @@ pub enum Endpoint {
DeleteBucket {
id: String,
},
PutBucketWebsite {
id: String,
},
DeleteBucketWebsite {
UpdateBucket {
id: String,
},
// Bucket-Key Permissions
@ -113,8 +110,7 @@ impl Endpoint {
GET "/v0/bucket" => ListBuckets,
POST "/v0/bucket" => CreateBucket,
DELETE "/v0/bucket" if id => DeleteBucket (query::id),
PUT "/v0/bucket/website" if id => PutBucketWebsite (query::id),
DELETE "/v0/bucket/website" if id => DeleteBucketWebsite (query::id),
PUT "/v0/bucket" if id => UpdateBucket (query::id),
// Bucket-key permissions
POST "/v0/bucket/allow" => BucketAllowKey,
POST "/v0/bucket/deny" => BucketDenyKey,

View File

@ -150,9 +150,7 @@ impl<A: ApiHandler> ApiServer<A> {
}
Err(e) => {
let body: Body = e.http_body(&self.region, uri.path());
let mut http_error_builder = Response::builder()
.status(e.http_status_code())
.header("Content-Type", "application/xml");
let mut http_error_builder = Response::builder().status(e.http_status_code());
if let Some(header_map) = http_error_builder.headers_mut() {
e.add_http_headers(header_map)

View File

@ -1,4 +1,4 @@
use hyper::{Body, Request};
use hyper::{Body, Request, Response};
use idna::domain_to_unicode;
use serde::{Deserialize, Serialize};
@ -144,6 +144,14 @@ pub async fn parse_json_body<T: for<'de> Deserialize<'de>>(req: Request<Body>) -
Ok(resp)
}
pub fn json_ok_response<T: Serialize>(res: &T) -> Result<Response<Body>, Error> {
let resp_json = serde_json::to_string_pretty(res).map_err(garage_util::error::Error::from)?;
Ok(Response::builder()
.status(hyper::StatusCode::OK)
.header(http::header::CONTENT_TYPE, "application/json")
.body(Body::from(resp_json))?)
}
#[cfg(test)]
mod tests {
use super::*;

View File

@ -110,8 +110,9 @@ impl ApiError for Error {
}
}
fn add_http_headers(&self, _header_map: &mut HeaderMap<HeaderValue>) {
// nothing
fn add_http_headers(&self, header_map: &mut HeaderMap<HeaderValue>) {
use hyper::header;
header_map.append(header::CONTENT_TYPE, "application/json".parse().unwrap());
}
fn http_body(&self, garage_region: &str, path: &str) -> Body {

View File

@ -10,7 +10,7 @@ use garage_rpc::ring::Ring;
use garage_table::util::*;
use garage_model::garage::Garage;
use garage_model::k2v::counter_table::{BYTES, CONFLICTS, ENTRIES, VALUES};
use garage_model::k2v::item_table::{BYTES, CONFLICTS, ENTRIES, VALUES};
use crate::k2v::error::*;
use crate::k2v::range::read_range;

View File

@ -4,7 +4,7 @@ use async_trait::async_trait;
use futures::future::Future;
use hyper::header;
use hyper::{Body, Method, Request, Response};
use hyper::{Body, Request, Response};
use opentelemetry::{trace::SpanRef, KeyValue};
@ -167,14 +167,7 @@ impl ApiHandler for S3ApiServer {
return Err(Error::forbidden("Operation is not allowed for this key."));
}
// Look up what CORS rule might apply to response.
// Requests for methods different than GET, HEAD or POST
// are always preflighted, i.e. the browser should make
// an OPTIONS call before to check it is allowed
let matching_cors_rule = match *req.method() {
Method::GET | Method::HEAD | Method::POST => find_matching_cors_rule(&bucket, &req)?,
_ => None,
};
let matching_cors_rule = find_matching_cors_rule(&bucket, &req)?;
let resp = match endpoint {
Endpoint::HeadObject {
@ -219,7 +212,7 @@ impl ApiHandler for S3ApiServer {
.await
}
Endpoint::PutObject { key } => {
handle_put(garage, req, bucket_id, &key, content_sha256).await
handle_put(garage, req, &bucket, &key, content_sha256).await
}
Endpoint::AbortMultipartUpload { key, upload_id } => {
handle_abort_multipart_upload(garage, bucket_id, &key, &upload_id).await
@ -233,7 +226,7 @@ impl ApiHandler for S3ApiServer {
garage,
req,
&bucket_name,
bucket_id,
&bucket,
&key,
&upload_id,
content_sha256,

View File

@ -172,6 +172,9 @@ impl ApiError for Error {
fn add_http_headers(&self, header_map: &mut HeaderMap<HeaderValue>) {
use hyper::header;
header_map.append(header::CONTENT_TYPE, "application/xml".parse().unwrap());
#[allow(clippy::single_match)]
match self {
Error::InvalidRange((_, len)) => {

View File

@ -22,7 +22,7 @@ use crate::signature::payload::{parse_date, verify_v4};
pub async fn handle_post_object(
garage: Arc<Garage>,
req: Request<Body>,
bucket: String,
bucket_name: String,
) -> Result<Response<Body>, Error> {
let boundary = req
.headers()
@ -126,13 +126,18 @@ pub async fn handle_post_object(
let bucket_id = garage
.bucket_helper()
.resolve_bucket(&bucket, &api_key)
.resolve_bucket(&bucket_name, &api_key)
.await?;
if !api_key.allow_write(&bucket_id) {
return Err(Error::forbidden("Operation is not allowed for this key."));
}
let bucket = garage
.bucket_helper()
.get_existing_bucket(bucket_id)
.await?;
let decoded_policy = base64::decode(&policy).ok_or_bad_request("Invalid policy")?;
let decoded_policy: Policy =
serde_json::from_slice(&decoded_policy).ok_or_bad_request("Invalid policy")?;
@ -227,7 +232,7 @@ pub async fn handle_post_object(
garage,
headers,
StreamLimiter::new(stream, conditions.content_length),
bucket_id,
&bucket,
&key,
None,
None,
@ -244,7 +249,7 @@ pub async fn handle_post_object(
{
target
.query_pairs_mut()
.append_pair("bucket", &bucket)
.append_pair("bucket", &bucket_name)
.append_pair("key", &key)
.append_pair("etag", &etag);
let target = target.to_string();
@ -289,7 +294,7 @@ pub async fn handle_post_object(
let xml = s3_xml::PostObject {
xmlns: (),
location: s3_xml::Value(location),
bucket: s3_xml::Value(bucket),
bucket: s3_xml::Value(bucket_name),
key: s3_xml::Value(key),
etag: s3_xml::Value(etag),
};

View File

@ -1,4 +1,4 @@
use std::collections::{BTreeMap, BTreeSet, VecDeque};
use std::collections::{BTreeMap, BTreeSet, HashMap, VecDeque};
use std::sync::Arc;
use futures::prelude::*;
@ -14,7 +14,9 @@ use garage_util::error::Error as GarageError;
use garage_util::time::*;
use garage_block::manager::INLINE_THRESHOLD;
use garage_model::bucket_table::Bucket;
use garage_model::garage::Garage;
use garage_model::index_counter::CountedItem;
use garage_model::s3::block_ref_table::*;
use garage_model::s3::object_table::*;
use garage_model::s3::version_table::*;
@ -26,7 +28,7 @@ use crate::signature::verify_signed_content;
pub async fn handle_put(
garage: Arc<Garage>,
req: Request<Body>,
bucket_id: Uuid,
bucket: &Bucket,
key: &str,
content_sha256: Option<Hash>,
) -> Result<Response<Body>, Error> {
@ -46,7 +48,7 @@ pub async fn handle_put(
garage,
headers,
body,
bucket_id,
bucket,
key,
content_md5,
content_sha256,
@ -59,7 +61,7 @@ pub(crate) async fn save_stream<S: Stream<Item = Result<Bytes, Error>> + Unpin>(
garage: Arc<Garage>,
headers: ObjectVersionHeaders,
body: S,
bucket_id: Uuid,
bucket: &Bucket,
key: &str,
content_md5: Option<String>,
content_sha256: Option<FixedBytes32>,
@ -80,6 +82,7 @@ pub(crate) async fn save_stream<S: Stream<Item = Result<Bytes, Error>> + Unpin>(
let data_md5sum_hex = hex::encode(data_md5sum);
let data_sha256sum = sha256sum(&first_block[..]);
let size = first_block.len() as u64;
ensure_checksum_matches(
data_md5sum.as_slice(),
@ -88,20 +91,22 @@ pub(crate) async fn save_stream<S: Stream<Item = Result<Bytes, Error>> + Unpin>(
content_sha256,
)?;
check_quotas(&garage, bucket, key, size).await?;
let object_version = ObjectVersion {
uuid: version_uuid,
timestamp: version_timestamp,
state: ObjectVersionState::Complete(ObjectVersionData::Inline(
ObjectVersionMeta {
headers,
size: first_block.len() as u64,
size,
etag: data_md5sum_hex.clone(),
},
first_block,
)),
};
let object = Object::new(bucket_id, key.into(), vec![object_version]);
let object = Object::new(bucket.id, key.into(), vec![object_version]);
garage.object_table.insert(&object).await?;
return Ok((version_uuid, data_md5sum_hex));
@ -114,36 +119,42 @@ pub(crate) async fn save_stream<S: Stream<Item = Result<Bytes, Error>> + Unpin>(
timestamp: version_timestamp,
state: ObjectVersionState::Uploading(headers.clone()),
};
let object = Object::new(bucket_id, key.into(), vec![object_version.clone()]);
let object = Object::new(bucket.id, key.into(), vec![object_version.clone()]);
garage.object_table.insert(&object).await?;
// Initialize corresponding entry in version table
// Write this entry now, even with empty block list,
// to prevent block_ref entries from being deleted (they can be deleted
// if the reference a version that isn't found in the version table)
let version = Version::new(version_uuid, bucket_id, key.into(), false);
let version = Version::new(version_uuid, bucket.id, key.into(), false);
garage.version_table.insert(&version).await?;
// Transfer data and verify checksum
let first_block_hash = blake2sum(&first_block[..]);
let tx_result = read_and_put_blocks(
&garage,
&version,
1,
first_block,
first_block_hash,
&mut chunker,
)
.await
.and_then(|(total_size, data_md5sum, data_sha256sum)| {
let tx_result = (|| async {
let (total_size, data_md5sum, data_sha256sum) = read_and_put_blocks(
&garage,
&version,
1,
first_block,
first_block_hash,
&mut chunker,
)
.await?;
ensure_checksum_matches(
data_md5sum.as_slice(),
data_sha256sum,
content_md5.as_deref(),
content_sha256,
)
.map(|()| (total_size, data_md5sum))
});
)?;
check_quotas(&garage, bucket, key, total_size).await?;
Ok((total_size, data_md5sum))
})()
.await;
// If something went wrong, clean up
let (total_size, md5sum_arr) = match tx_result {
@ -151,7 +162,7 @@ pub(crate) async fn save_stream<S: Stream<Item = Result<Bytes, Error>> + Unpin>(
Err(e) => {
// Mark object as aborted, this will free the blocks further down
object_version.state = ObjectVersionState::Aborted;
let object = Object::new(bucket_id, key.into(), vec![object_version.clone()]);
let object = Object::new(bucket.id, key.into(), vec![object_version.clone()]);
garage.object_table.insert(&object).await?;
return Err(e);
}
@ -167,7 +178,7 @@ pub(crate) async fn save_stream<S: Stream<Item = Result<Bytes, Error>> + Unpin>(
},
first_block_hash,
));
let object = Object::new(bucket_id, key.into(), vec![object_version]);
let object = Object::new(bucket.id, key.into(), vec![object_version]);
garage.object_table.insert(&object).await?;
Ok((version_uuid, md5sum_hex))
@ -200,6 +211,64 @@ fn ensure_checksum_matches(
Ok(())
}
/// Check that inserting this object with this size doesn't exceed bucket quotas
async fn check_quotas(
garage: &Arc<Garage>,
bucket: &Bucket,
key: &str,
size: u64,
) -> Result<(), Error> {
let quotas = bucket.state.as_option().unwrap().quotas.get();
if quotas.max_objects.is_none() && quotas.max_size.is_none() {
return Ok(());
};
let key = key.to_string();
let (prev_object, counters) = futures::try_join!(
garage.object_table.get(&bucket.id, &key),
garage.object_counter_table.table.get(&bucket.id, &EmptyKey),
)?;
let counters = counters
.map(|x| x.filtered_values(&garage.system.ring.borrow()))
.unwrap_or_default();
let (prev_cnt_obj, prev_cnt_size) = match prev_object {
Some(o) => {
let prev_cnt = o.counts().into_iter().collect::<HashMap<_, _>>();
(
prev_cnt.get(OBJECTS).cloned().unwrap_or_default(),
prev_cnt.get(BYTES).cloned().unwrap_or_default(),
)
}
None => (0, 0),
};
let cnt_obj_diff = 1 - prev_cnt_obj;
let cnt_size_diff = size as i64 - prev_cnt_size;
if let Some(mo) = quotas.max_objects {
let current_objects = counters.get(OBJECTS).cloned().unwrap_or_default();
if cnt_obj_diff > 0 && current_objects + cnt_obj_diff > mo as i64 {
return Err(Error::forbidden(format!(
"Object quota is reached, maximum objects for this bucket: {}",
mo
)));
}
}
if let Some(ms) = quotas.max_size {
let current_size = counters.get(BYTES).cloned().unwrap_or_default();
if cnt_size_diff > 0 && current_size + cnt_size_diff > ms as i64 {
return Err(Error::forbidden(format!(
"Bucket size quota is reached, maximum total size of objects for this bucket: {}. The bucket is already {} bytes, and this object would add {} bytes.",
ms, current_size, size
)));
}
}
Ok(())
}
async fn read_and_put_blocks<S: Stream<Item = Result<Bytes, Error>> + Unpin>(
garage: &Garage,
version: &Version,
@ -473,7 +542,7 @@ pub async fn handle_complete_multipart_upload(
garage: Arc<Garage>,
req: Request<Body>,
bucket_name: &str,
bucket_id: Uuid,
bucket: &Bucket,
key: &str,
upload_id: &str,
content_sha256: Option<Hash>,
@ -497,7 +566,7 @@ pub async fn handle_complete_multipart_upload(
// Get object and version
let key = key.to_string();
let (object, version) = futures::try_join!(
garage.object_table.get(&bucket_id, &key),
garage.object_table.get(&bucket.id, &key),
garage.version_table.get(&version_uuid, &EmptyKey),
)?;
@ -590,6 +659,14 @@ pub async fn handle_complete_multipart_upload(
// Calculate total size of final object
let total_size = version.blocks.items().iter().map(|x| x.1.size).sum();
if let Err(e) = check_quotas(&garage, bucket, &key, total_size).await {
object_version.state = ObjectVersionState::Aborted;
let final_object = Object::new(bucket.id, key.clone(), vec![object_version]);
garage.object_table.insert(&final_object).await?;
return Err(e);
}
// Write final object version
object_version.state = ObjectVersionState::Complete(ObjectVersionData::FirstBlock(
ObjectVersionMeta {
@ -600,7 +677,7 @@ pub async fn handle_complete_multipart_upload(
version.blocks.items()[0].1.hash,
));
let final_object = Object::new(bucket_id, key.clone(), vec![object_version]);
let final_object = Object::new(bucket.id, key.clone(), vec![object_version]);
garage.object_table.insert(&final_object).await?;
// Send response saying ok we're done

View File

@ -14,6 +14,7 @@ path = "lib.rs"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
garage_db = { version = "0.8.0", path = "../db" }
garage_rpc = { version = "0.7.0", path = "../rpc" }
garage_util = { version = "0.7.0", path = "../util" }
garage_table = { version = "0.7.0", path = "../table" }
@ -27,8 +28,6 @@ tracing = "0.1.30"
rand = "0.8"
zstd = { version = "0.9", default-features = false }
sled = "0.34"
rmp-serde = "0.15"
serde = { version = "1.0", default-features = false, features = ["derive", "rc"] }
serde_bytes = "0.11"

View File

@ -1,3 +1,5 @@
use core::ops::Bound;
use std::convert::TryInto;
use std::path::{Path, PathBuf};
use std::sync::Arc;
@ -17,10 +19,12 @@ use opentelemetry::{
Context, KeyValue,
};
use garage_db as db;
use garage_db::counted_tree_hack::CountedTree;
use garage_util::data::*;
use garage_util::error::*;
use garage_util::metrics::RecordDuration;
use garage_util::sled_counter::SledCountedTree;
use garage_util::time::*;
use garage_util::tranquilizer::Tranquilizer;
@ -91,9 +95,9 @@ pub struct BlockManager {
rc: BlockRc,
resync_queue: SledCountedTree,
resync_queue: CountedTree,
resync_notify: Notify,
resync_errors: SledCountedTree,
resync_errors: CountedTree,
system: Arc<System>,
endpoint: Arc<Endpoint<BlockRpc, Self>>,
@ -108,7 +112,7 @@ struct BlockManagerLocked();
impl BlockManager {
pub fn new(
db: &sled::Db,
db: &db::Db,
data_dir: PathBuf,
compression_level: Option<i32>,
background_tranquility: u32,
@ -123,12 +127,14 @@ impl BlockManager {
let resync_queue = db
.open_tree("block_local_resync_queue")
.expect("Unable to open block_local_resync_queue tree");
let resync_queue = SledCountedTree::new(resync_queue);
let resync_queue =
CountedTree::new(resync_queue).expect("Could not count block_local_resync_queue");
let resync_errors = db
.open_tree("block_local_resync_errors")
.expect("Unable to open block_local_resync_errors tree");
let resync_errors = SledCountedTree::new(resync_errors);
let resync_errors =
CountedTree::new(resync_errors).expect("Could not count block_local_resync_errors");
let endpoint = system
.netapp
@ -219,11 +225,44 @@ impl BlockManager {
/// to fix any mismatch between the two.
pub async fn repair_data_store(&self, must_exit: &watch::Receiver<bool>) -> Result<(), Error> {
// 1. Repair blocks from RC table.
for (i, entry) in self.rc.rc.iter().enumerate() {
let (hash, _) = entry?;
let hash = Hash::try_from(&hash[..]).unwrap();
self.put_to_resync(&hash, Duration::from_secs(0))?;
if i & 0xFF == 0 && *must_exit.borrow() {
let mut next_start: Option<Hash> = None;
loop {
// We have to do this complicated two-step process where we first read a bunch
// of hashes from the RC table, and then insert them in the to-resync queue,
// because of SQLite. Basically, as long as we have an iterator on a DB table,
// we can't do anything else on the DB. The naive approach (which we had previously)
// of just iterating on the RC table and inserting items one to one in the resync
// queue can't work here, it would just provoke a deadlock in the SQLite adapter code.
// This is mostly because the Rust bindings for SQLite assume a worst-case scenario
// where SQLite is not compiled in thread-safe mode, so we have to wrap everything
// in a mutex (see db/sqlite_adapter.rs and discussion in PR #322).
let mut batch_of_hashes = vec![];
let start_bound = match next_start.as_ref() {
None => Bound::Unbounded,
Some(x) => Bound::Excluded(x.as_slice()),
};
for entry in self
.rc
.rc
.range::<&[u8], _>((start_bound, Bound::Unbounded))?
{
let (hash, _) = entry?;
let hash = Hash::try_from(&hash[..]).unwrap();
batch_of_hashes.push(hash);
if batch_of_hashes.len() >= 1000 {
break;
}
}
if batch_of_hashes.is_empty() {
break;
}
for hash in batch_of_hashes.into_iter() {
self.put_to_resync(&hash, Duration::from_secs(0))?;
next_start = Some(hash)
}
if *must_exit.borrow() {
return Ok(());
}
}
@ -264,46 +303,69 @@ impl BlockManager {
}
/// Get lenght of resync queue
pub fn resync_queue_len(&self) -> usize {
self.resync_queue.len()
pub fn resync_queue_len(&self) -> Result<usize, Error> {
// This currently can't return an error because the CountedTree hack
// doesn't error on .len(), but this will change when we remove the hack
// (hopefully someday!)
Ok(self.resync_queue.len())
}
/// Get number of blocks that have an error
pub fn resync_errors_len(&self) -> usize {
self.resync_errors.len()
pub fn resync_errors_len(&self) -> Result<usize, Error> {
// (see resync_queue_len comment)
Ok(self.resync_errors.len())
}
/// Get number of items in the refcount table
pub fn rc_len(&self) -> usize {
self.rc.rc.len()
pub fn rc_len(&self) -> Result<usize, Error> {
Ok(self.rc.rc.len()?)
}
//// ----- Managing the reference counter ----
/// Increment the number of time a block is used, putting it to resynchronization if it is
/// required, but not known
pub fn block_incref(&self, hash: &Hash) -> Result<(), Error> {
if self.rc.block_incref(hash)? {
pub fn block_incref(
self: &Arc<Self>,
tx: &mut db::Transaction,
hash: Hash,
) -> db::TxOpResult<()> {
if self.rc.block_incref(tx, &hash)? {
// When the reference counter is incremented, there is
// normally a node that is responsible for sending us the
// data of the block. However that operation may fail,
// so in all cases we add the block here to the todo list
// to check later that it arrived correctly, and if not
// we will fecth it from someone.
self.put_to_resync(hash, 2 * BLOCK_RW_TIMEOUT)?;
let this = self.clone();
tokio::spawn(async move {
if let Err(e) = this.put_to_resync(&hash, 2 * BLOCK_RW_TIMEOUT) {
error!("Block {:?} could not be put in resync queue: {}.", hash, e);
}
});
}
Ok(())
}
/// Decrement the number of time a block is used
pub fn block_decref(&self, hash: &Hash) -> Result<(), Error> {
if self.rc.block_decref(hash)? {
pub fn block_decref(
self: &Arc<Self>,
tx: &mut db::Transaction,
hash: Hash,
) -> db::TxOpResult<()> {
if self.rc.block_decref(tx, &hash)? {
// When the RC is decremented, it might drop to zero,
// indicating that we don't need the block.
// There is a delay before we garbage collect it;
// make sure that it is handled in the resync loop
// after that delay has passed.
self.put_to_resync(hash, BLOCK_GC_DELAY + Duration::from_secs(10))?;
let this = self.clone();
tokio::spawn(async move {
if let Err(e) = this.put_to_resync(&hash, BLOCK_GC_DELAY + Duration::from_secs(10))
{
error!("Block {:?} could not be put in resync queue: {}.", hash, e);
}
});
}
Ok(())
}
@ -503,12 +565,12 @@ impl BlockManager {
});
}
fn put_to_resync(&self, hash: &Hash, delay: Duration) -> Result<(), sled::Error> {
fn put_to_resync(&self, hash: &Hash, delay: Duration) -> db::Result<()> {
let when = now_msec() + delay.as_millis() as u64;
self.put_to_resync_at(hash, when)
}
fn put_to_resync_at(&self, hash: &Hash, when: u64) -> Result<(), sled::Error> {
fn put_to_resync_at(&self, hash: &Hash, when: u64) -> db::Result<()> {
trace!("Put resync_queue: {} {:?}", when, hash);
let mut key = u64::to_be_bytes(when).to_vec();
key.extend(hash.as_ref());
@ -547,13 +609,8 @@ impl BlockManager {
// - Ok(true) -> a block was processed (successfully or not)
// - Ok(false) -> no block was processed, but we are ready for the next iteration
// - Err(_) -> a Sled error occurred when reading/writing from resync_queue/resync_errors
async fn resync_iter(
&self,
must_exit: &mut watch::Receiver<bool>,
) -> Result<bool, sled::Error> {
if let Some(first_pair_res) = self.resync_queue.iter().next() {
let (time_bytes, hash_bytes) = first_pair_res?;
async fn resync_iter(&self, must_exit: &mut watch::Receiver<bool>) -> Result<bool, db::Error> {
if let Some((time_bytes, hash_bytes)) = self.resync_queue.first()? {
let time_msec = u64::from_be_bytes(time_bytes[0..8].try_into().unwrap());
let now = now_msec();
@ -561,7 +618,7 @@ impl BlockManager {
let hash = Hash::try_from(&hash_bytes[..]).unwrap();
if let Some(ec) = self.resync_errors.get(hash.as_slice())? {
let ec = ErrorCounter::decode(ec);
let ec = ErrorCounter::decode(&ec);
if now < ec.next_try() {
// if next retry after an error is not yet,
// don't do resync and return early, but still
@ -602,7 +659,7 @@ impl BlockManager {
warn!("Error when resyncing {:?}: {}", hash, e);
let err_counter = match self.resync_errors.get(hash.as_slice())? {
Some(ec) => ErrorCounter::decode(ec).add1(now + 1),
Some(ec) => ErrorCounter::decode(&ec).add1(now + 1),
None => ErrorCounter::new(now + 1),
};
@ -966,7 +1023,7 @@ impl ErrorCounter {
}
}
fn decode(data: sled::IVec) -> Self {
fn decode(data: &[u8]) -> Self {
Self {
errors: u64::from_be_bytes(data[0..8].try_into().unwrap()),
last_try: u64::from_be_bytes(data[8..16].try_into().unwrap()),

View File

@ -1,6 +1,6 @@
use opentelemetry::{global, metrics::*};
use garage_util::sled_counter::SledCountedTree;
use garage_db::counted_tree_hack::CountedTree;
/// TableMetrics reference all counter used for metrics
pub struct BlockManagerMetrics {
@ -23,7 +23,7 @@ pub struct BlockManagerMetrics {
}
impl BlockManagerMetrics {
pub fn new(resync_queue: SledCountedTree, resync_errors: SledCountedTree) -> Self {
pub fn new(resync_queue: CountedTree, resync_errors: CountedTree) -> Self {
let meter = global::meter("garage_model/block");
Self {
_resync_queue_len: meter

View File

@ -1,5 +1,7 @@
use std::convert::TryInto;
use garage_db as db;
use garage_util::data::*;
use garage_util::error::*;
use garage_util::time::*;
@ -7,31 +9,41 @@ use garage_util::time::*;
use crate::manager::BLOCK_GC_DELAY;
pub struct BlockRc {
pub(crate) rc: sled::Tree,
pub(crate) rc: db::Tree,
}
impl BlockRc {
pub(crate) fn new(rc: sled::Tree) -> Self {
pub(crate) fn new(rc: db::Tree) -> Self {
Self { rc }
}
/// Increment the reference counter associated to a hash.
/// Returns true if the RC goes from zero to nonzero.
pub(crate) fn block_incref(&self, hash: &Hash) -> Result<bool, Error> {
let old_rc = self
.rc
.fetch_and_update(&hash, |old| RcEntry::parse_opt(old).increment().serialize())?;
let old_rc = RcEntry::parse_opt(old_rc);
pub(crate) fn block_incref(
&self,
tx: &mut db::Transaction,
hash: &Hash,
) -> db::TxOpResult<bool> {
let old_rc = RcEntry::parse_opt(tx.get(&self.rc, &hash)?);
match old_rc.increment().serialize() {
Some(x) => tx.insert(&self.rc, &hash, x)?,
None => unreachable!(),
};
Ok(old_rc.is_zero())
}
/// Decrement the reference counter associated to a hash.
/// Returns true if the RC is now zero.
pub(crate) fn block_decref(&self, hash: &Hash) -> Result<bool, Error> {
let new_rc = self
.rc
.update_and_fetch(&hash, |old| RcEntry::parse_opt(old).decrement().serialize())?;
let new_rc = RcEntry::parse_opt(new_rc);
pub(crate) fn block_decref(
&self,
tx: &mut db::Transaction,
hash: &Hash,
) -> db::TxOpResult<bool> {
let new_rc = RcEntry::parse_opt(tx.get(&self.rc, &hash)?).decrement();
match new_rc.serialize() {
Some(x) => tx.insert(&self.rc, &hash, x)?,
None => tx.remove(&self.rc, &hash)?,
};
Ok(matches!(new_rc, RcEntry::Deletable { .. }))
}
@ -44,12 +56,15 @@ impl BlockRc {
/// deletion time has passed
pub(crate) fn clear_deleted_block_rc(&self, hash: &Hash) -> Result<(), Error> {
let now = now_msec();
self.rc.update_and_fetch(&hash, |rcval| {
let updated = match RcEntry::parse_opt(rcval) {
RcEntry::Deletable { at_time } if now > at_time => RcEntry::Absent,
v => v,
self.rc.db().transaction(|mut tx| {
let rcval = RcEntry::parse_opt(tx.get(&self.rc, &hash)?);
match rcval {
RcEntry::Deletable { at_time } if now > at_time => {
tx.remove(&self.rc, &hash)?;
}
_ => (),
};
updated.serialize()
tx.commit(())
})?;
Ok(())
}

36
src/db/Cargo.toml Normal file
View File

@ -0,0 +1,36 @@
[package]
name = "garage_db"
version = "0.8.0"
authors = ["Alex Auvolat <alex@adnab.me>"]
edition = "2018"
license = "AGPL-3.0"
description = "Abstraction over multiple key/value storage engines that supports transactions"
repository = "https://git.deuxfleurs.fr/Deuxfleurs/garage"
readme = "../../README.md"
[lib]
path = "lib.rs"
[[bin]]
name = "convert"
path = "bin/convert.rs"
required-features = ["cli"]
[dependencies]
err-derive = "0.3"
hexdump = "0.1"
log = "0.4"
heed = "0.11"
rusqlite = { version = "0.27", features = ["bundled"] }
sled = "0.34"
# cli deps
clap = { version = "3.1.18", optional = true, features = ["derive", "env"] }
pretty_env_logger = { version = "0.4", optional = true }
[dev-dependencies]
mktemp = "0.4"
[features]
cli = ["clap", "pretty_env_logger"]

69
src/db/bin/convert.rs Normal file
View File

@ -0,0 +1,69 @@
use std::path::PathBuf;
use garage_db::*;
use clap::Parser;
/// K2V command line interface
#[derive(Parser, Debug)]
#[clap(author, version, about, long_about = None)]
struct Args {
/// Input DB path
#[clap(short = 'i')]
input_path: PathBuf,
/// Input DB engine
#[clap(short = 'a')]
input_engine: String,
/// Output DB path
#[clap(short = 'o')]
output_path: PathBuf,
/// Output DB engine
#[clap(short = 'b')]
output_engine: String,
}
fn main() {
let args = Args::parse();
pretty_env_logger::init();
match do_conversion(args) {
Ok(()) => println!("Success!"),
Err(e) => eprintln!("Error: {}", e),
}
}
fn do_conversion(args: Args) -> Result<()> {
let input = open_db(args.input_path, args.input_engine)?;
let output = open_db(args.output_path, args.output_engine)?;
output.import(&input)?;
Ok(())
}
fn open_db(path: PathBuf, engine: String) -> Result<Db> {
match engine.as_str() {
"sled" => {
let db = sled_adapter::sled::Config::default().path(&path).open()?;
Ok(sled_adapter::SledDb::init(db))
}
"sqlite" | "sqlite3" | "rusqlite" => {
let db = sqlite_adapter::rusqlite::Connection::open(&path)?;
Ok(sqlite_adapter::SqliteDb::init(db))
}
"lmdb" | "heed" => {
std::fs::create_dir_all(&path).map_err(|e| {
Error(format!("Unable to create LMDB data directory: {}", e).into())
})?;
let map_size = lmdb_adapter::recommended_map_size();
let db = lmdb_adapter::heed::EnvOpenOptions::new()
.max_dbs(100)
.map_size(map_size)
.open(&path)
.unwrap();
Ok(lmdb_adapter::LmdbDb::init(db))
}
e => Err(Error(format!("Invalid DB engine: {}", e).into())),
}
}

127
src/db/counted_tree_hack.rs Normal file
View File

@ -0,0 +1,127 @@
//! This hack allows a db tree to keep in RAM a counter of the number of entries
//! it contains, which is used to call .len() on it. This is usefull only for
//! the sled backend where .len() otherwise would have to traverse the whole
//! tree to count items. For sqlite and lmdb, this is mostly useless (but
//! hopefully not harmfull!). Note that a CountedTree cannot be part of a
//! transaction.
use std::sync::{
atomic::{AtomicUsize, Ordering},
Arc,
};
use crate::{Result, Tree, TxError, Value, ValueIter};
#[derive(Clone)]
pub struct CountedTree(Arc<CountedTreeInternal>);
struct CountedTreeInternal {
tree: Tree,
len: AtomicUsize,
}
impl CountedTree {
pub fn new(tree: Tree) -> Result<Self> {
let len = tree.len()?;
Ok(Self(Arc::new(CountedTreeInternal {
tree,
len: AtomicUsize::new(len),
})))
}
pub fn len(&self) -> usize {
self.0.len.load(Ordering::SeqCst)
}
pub fn is_empty(&self) -> bool {
self.len() == 0
}
pub fn get<K: AsRef<[u8]>>(&self, key: K) -> Result<Option<Value>> {
self.0.tree.get(key)
}
pub fn first(&self) -> Result<Option<(Value, Value)>> {
self.0.tree.first()
}
pub fn iter(&self) -> Result<ValueIter<'_>> {
self.0.tree.iter()
}
// ---- writing functions ----
pub fn insert<K, V>(&self, key: K, value: V) -> Result<Option<Value>>
where
K: AsRef<[u8]>,
V: AsRef<[u8]>,
{
let old_val = self.0.tree.insert(key, value)?;
if old_val.is_none() {
self.0.len.fetch_add(1, Ordering::SeqCst);
}
Ok(old_val)
}
pub fn remove<K: AsRef<[u8]>>(&self, key: K) -> Result<Option<Value>> {
let old_val = self.0.tree.remove(key)?;
if old_val.is_some() {
self.0.len.fetch_sub(1, Ordering::SeqCst);
}
Ok(old_val)
}
pub fn compare_and_swap<K, OV, NV>(
&self,
key: K,
expected_old: Option<OV>,
new: Option<NV>,
) -> Result<bool>
where
K: AsRef<[u8]>,
OV: AsRef<[u8]>,
NV: AsRef<[u8]>,
{
let old_some = expected_old.is_some();
let new_some = new.is_some();
let tx_res = self.0.tree.db().transaction(|mut tx| {
let old_val = tx.get(&self.0.tree, &key)?;
let is_same = match (&old_val, &expected_old) {
(None, None) => true,
(Some(x), Some(y)) if x == y.as_ref() => true,
_ => false,
};
if is_same {
match &new {
Some(v) => {
tx.insert(&self.0.tree, &key, v)?;
}
None => {
tx.remove(&self.0.tree, &key)?;
}
}
tx.commit(())
} else {
tx.abort(())
}
});
match tx_res {
Ok(()) => {
match (old_some, new_some) {
(false, true) => {
self.0.len.fetch_add(1, Ordering::SeqCst);
}
(true, false) => {
self.0.len.fetch_sub(1, Ordering::SeqCst);
}
_ => (),
}
Ok(true)
}
Err(TxError::Abort(())) => Ok(false),
Err(TxError::Db(e)) => Err(e),
}
}
}

406
src/db/lib.rs Normal file
View File

@ -0,0 +1,406 @@
pub mod lmdb_adapter;
pub mod sled_adapter;
pub mod sqlite_adapter;
pub mod counted_tree_hack;
#[cfg(test)]
pub mod test;
use core::ops::{Bound, RangeBounds};
use std::borrow::Cow;
use std::cell::Cell;
use std::sync::Arc;
use err_derive::Error;
#[derive(Clone)]
pub struct Db(pub(crate) Arc<dyn IDb>);
pub struct Transaction<'a>(&'a mut dyn ITx);
#[derive(Clone)]
pub struct Tree(Arc<dyn IDb>, usize);
pub type Value = Vec<u8>;
pub type ValueIter<'a> = Box<dyn std::iter::Iterator<Item = Result<(Value, Value)>> + 'a>;
pub type TxValueIter<'a> = Box<dyn std::iter::Iterator<Item = TxOpResult<(Value, Value)>> + 'a>;
// ----
#[derive(Debug, Error)]
#[error(display = "{}", _0)]
pub struct Error(pub Cow<'static, str>);
pub type Result<T> = std::result::Result<T, Error>;
#[derive(Debug, Error)]
#[error(display = "{}", _0)]
pub struct TxOpError(pub(crate) Error);
pub type TxOpResult<T> = std::result::Result<T, TxOpError>;
pub enum TxError<E> {
Abort(E),
Db(Error),
}
pub type TxResult<R, E> = std::result::Result<R, TxError<E>>;
impl<E> From<TxOpError> for TxError<E> {
fn from(e: TxOpError) -> TxError<E> {
TxError::Db(e.0)
}
}
pub fn unabort<R, E>(res: TxResult<R, E>) -> TxOpResult<std::result::Result<R, E>> {
match res {
Ok(v) => Ok(Ok(v)),
Err(TxError::Abort(e)) => Ok(Err(e)),
Err(TxError::Db(e)) => Err(TxOpError(e)),
}
}
// ----
impl Db {
pub fn engine(&self) -> String {
self.0.engine()
}
pub fn open_tree<S: AsRef<str>>(&self, name: S) -> Result<Tree> {
let tree_id = self.0.open_tree(name.as_ref())?;
Ok(Tree(self.0.clone(), tree_id))
}
pub fn list_trees(&self) -> Result<Vec<String>> {
self.0.list_trees()
}
pub fn transaction<R, E, F>(&self, fun: F) -> TxResult<R, E>
where
F: Fn(Transaction<'_>) -> TxResult<R, E>,
{
let f = TxFn {
function: fun,
result: Cell::new(None),
};
let tx_res = self.0.transaction(&f);
let ret = f
.result
.into_inner()
.expect("Transaction did not store result");
match tx_res {
Ok(()) => {
assert!(matches!(ret, Ok(_)));
ret
}
Err(TxError::Abort(())) => {
assert!(matches!(ret, Err(TxError::Abort(_))));
ret
}
Err(TxError::Db(e2)) => match ret {
// Ok was stored -> the error occured when finalizing
// transaction
Ok(_) => Err(TxError::Db(e2)),
// An error was already stored: that's the one we want to
// return
Err(TxError::Db(e)) => Err(TxError::Db(e)),
_ => unreachable!(),
},
}
}
pub fn import(&self, other: &Db) -> Result<()> {
let existing_trees = self.list_trees()?;
if !existing_trees.is_empty() {
return Err(Error(
format!(
"destination database already contains data: {:?}",
existing_trees
)
.into(),
));
}
let tree_names = other.list_trees()?;
for name in tree_names {
let tree = self.open_tree(&name)?;
if tree.len()? > 0 {
return Err(Error(format!("tree {} already contains data", name).into()));
}
let ex_tree = other.open_tree(&name)?;
let tx_res = self.transaction(|mut tx| {
let mut i = 0;
for item in ex_tree.iter().map_err(TxError::Abort)? {
let (k, v) = item.map_err(TxError::Abort)?;
tx.insert(&tree, k, v)?;
i += 1;
if i % 1000 == 0 {
println!("{}: imported {}", name, i);
}
}
tx.commit(i)
});
let total = match tx_res {
Err(TxError::Db(e)) => return Err(e),
Err(TxError::Abort(e)) => return Err(e),
Ok(x) => x,
};
println!("{}: finished importing, {} items", name, total);
}
Ok(())
}
}
#[allow(clippy::len_without_is_empty)]
impl Tree {
#[inline]
pub fn db(&self) -> Db {
Db(self.0.clone())
}
#[inline]
pub fn get<T: AsRef<[u8]>>(&self, key: T) -> Result<Option<Value>> {
self.0.get(self.1, key.as_ref())
}
#[inline]
pub fn len(&self) -> Result<usize> {
self.0.len(self.1)
}
#[inline]
pub fn first(&self) -> Result<Option<(Value, Value)>> {
self.iter()?.next().transpose()
}
#[inline]
pub fn get_gt<T: AsRef<[u8]>>(&self, from: T) -> Result<Option<(Value, Value)>> {
self.range((Bound::Excluded(from), Bound::Unbounded))?
.next()
.transpose()
}
/// Returns the old value if there was one
#[inline]
pub fn insert<T: AsRef<[u8]>, U: AsRef<[u8]>>(
&self,
key: T,
value: U,
) -> Result<Option<Value>> {
self.0.insert(self.1, key.as_ref(), value.as_ref())
}
/// Returns the old value if there was one
#[inline]
pub fn remove<T: AsRef<[u8]>>(&self, key: T) -> Result<Option<Value>> {
self.0.remove(self.1, key.as_ref())
}
/// Clears all values from the tree
#[inline]
pub fn clear(&self) -> Result<()> {
self.0.clear(self.1)
}
#[inline]
pub fn iter(&self) -> Result<ValueIter<'_>> {
self.0.iter(self.1)
}
#[inline]
pub fn iter_rev(&self) -> Result<ValueIter<'_>> {
self.0.iter_rev(self.1)
}
#[inline]
pub fn range<K, R>(&self, range: R) -> Result<ValueIter<'_>>
where
K: AsRef<[u8]>,
R: RangeBounds<K>,
{
let sb = range.start_bound();
let eb = range.end_bound();
self.0.range(self.1, get_bound(sb), get_bound(eb))
}
#[inline]
pub fn range_rev<K, R>(&self, range: R) -> Result<ValueIter<'_>>
where
K: AsRef<[u8]>,
R: RangeBounds<K>,
{
let sb = range.start_bound();
let eb = range.end_bound();
self.0.range_rev(self.1, get_bound(sb), get_bound(eb))
}
}
#[allow(clippy::len_without_is_empty)]
impl<'a> Transaction<'a> {
#[inline]
pub fn get<T: AsRef<[u8]>>(&self, tree: &Tree, key: T) -> TxOpResult<Option<Value>> {
self.0.get(tree.1, key.as_ref())
}
#[inline]
pub fn len(&self, tree: &Tree) -> TxOpResult<usize> {
self.0.len(tree.1)
}
/// Returns the old value if there was one
#[inline]
pub fn insert<T: AsRef<[u8]>, U: AsRef<[u8]>>(
&mut self,
tree: &Tree,
key: T,
value: U,
) -> TxOpResult<Option<Value>> {
self.0.insert(tree.1, key.as_ref(), value.as_ref())
}
/// Returns the old value if there was one
#[inline]
pub fn remove<T: AsRef<[u8]>>(&mut self, tree: &Tree, key: T) -> TxOpResult<Option<Value>> {
self.0.remove(tree.1, key.as_ref())
}
#[inline]
pub fn iter(&self, tree: &Tree) -> TxOpResult<TxValueIter<'_>> {
self.0.iter(tree.1)
}
#[inline]
pub fn iter_rev(&self, tree: &Tree) -> TxOpResult<TxValueIter<'_>> {
self.0.iter_rev(tree.1)
}
#[inline]
pub fn range<K, R>(&self, tree: &Tree, range: R) -> TxOpResult<TxValueIter<'_>>
where
K: AsRef<[u8]>,
R: RangeBounds<K>,
{
let sb = range.start_bound();
let eb = range.end_bound();
self.0.range(tree.1, get_bound(sb), get_bound(eb))
}
#[inline]
pub fn range_rev<K, R>(&self, tree: &Tree, range: R) -> TxOpResult<TxValueIter<'_>>
where
K: AsRef<[u8]>,
R: RangeBounds<K>,
{
let sb = range.start_bound();
let eb = range.end_bound();
self.0.range_rev(tree.1, get_bound(sb), get_bound(eb))
}
// ----
#[inline]
pub fn abort<R, E>(self, e: E) -> TxResult<R, E> {
Err(TxError::Abort(e))
}
#[inline]
pub fn commit<R, E>(self, r: R) -> TxResult<R, E> {
Ok(r)
}
}
// ---- Internal interfaces
pub(crate) trait IDb: Send + Sync {
fn engine(&self) -> String;
fn open_tree(&self, name: &str) -> Result<usize>;
fn list_trees(&self) -> Result<Vec<String>>;
fn get(&self, tree: usize, key: &[u8]) -> Result<Option<Value>>;
fn len(&self, tree: usize) -> Result<usize>;
fn insert(&self, tree: usize, key: &[u8], value: &[u8]) -> Result<Option<Value>>;
fn remove(&self, tree: usize, key: &[u8]) -> Result<Option<Value>>;
fn clear(&self, tree: usize) -> Result<()>;
fn iter(&self, tree: usize) -> Result<ValueIter<'_>>;
fn iter_rev(&self, tree: usize) -> Result<ValueIter<'_>>;
fn range<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>>;
fn range_rev<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>>;
fn transaction(&self, f: &dyn ITxFn) -> TxResult<(), ()>;
}
pub(crate) trait ITx {
fn get(&self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>>;
fn len(&self, tree: usize) -> TxOpResult<usize>;
fn insert(&mut self, tree: usize, key: &[u8], value: &[u8]) -> TxOpResult<Option<Value>>;
fn remove(&mut self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>>;
fn iter(&self, tree: usize) -> TxOpResult<TxValueIter<'_>>;
fn iter_rev(&self, tree: usize) -> TxOpResult<TxValueIter<'_>>;
fn range<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>>;
fn range_rev<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>>;
}
pub(crate) trait ITxFn {
fn try_on(&self, tx: &mut dyn ITx) -> TxFnResult;
}
pub(crate) enum TxFnResult {
Ok,
Abort,
DbErr,
}
struct TxFn<F, R, E>
where
F: Fn(Transaction<'_>) -> TxResult<R, E>,
{
function: F,
result: Cell<Option<TxResult<R, E>>>,
}
impl<F, R, E> ITxFn for TxFn<F, R, E>
where
F: Fn(Transaction<'_>) -> TxResult<R, E>,
{
fn try_on(&self, tx: &mut dyn ITx) -> TxFnResult {
let res = (self.function)(Transaction(tx));
let res2 = match &res {
Ok(_) => TxFnResult::Ok,
Err(TxError::Abort(_)) => TxFnResult::Abort,
Err(TxError::Db(_)) => TxFnResult::DbErr,
};
self.result.set(Some(res));
res2
}
}
// ----
fn get_bound<K: AsRef<[u8]>>(b: Bound<&K>) -> Bound<&[u8]> {
match b {
Bound::Included(v) => Bound::Included(v.as_ref()),
Bound::Excluded(v) => Bound::Excluded(v.as_ref()),
Bound::Unbounded => Bound::Unbounded,
}
}

351
src/db/lmdb_adapter.rs Normal file
View File

@ -0,0 +1,351 @@
use core::ops::Bound;
use core::ptr::NonNull;
use std::collections::HashMap;
use std::convert::TryInto;
use std::sync::{Arc, RwLock};
use heed::types::ByteSlice;
use heed::{BytesDecode, Env, RoTxn, RwTxn, UntypedDatabase as Database};
use crate::{
Db, Error, IDb, ITx, ITxFn, Result, TxError, TxFnResult, TxOpError, TxOpResult, TxResult,
TxValueIter, Value, ValueIter,
};
pub use heed;
// -- err
impl From<heed::Error> for Error {
fn from(e: heed::Error) -> Error {
Error(format!("LMDB: {}", e).into())
}
}
impl From<heed::Error> for TxOpError {
fn from(e: heed::Error) -> TxOpError {
TxOpError(e.into())
}
}
// -- db
pub struct LmdbDb {
db: heed::Env,
trees: RwLock<(Vec<Database>, HashMap<String, usize>)>,
}
impl LmdbDb {
pub fn init(db: Env) -> Db {
let s = Self {
db,
trees: RwLock::new((Vec::new(), HashMap::new())),
};
Db(Arc::new(s))
}
fn get_tree(&self, i: usize) -> Result<Database> {
self.trees
.read()
.unwrap()
.0
.get(i)
.cloned()
.ok_or_else(|| Error("invalid tree id".into()))
}
}
impl IDb for LmdbDb {
fn engine(&self) -> String {
"LMDB (using Heed crate)".into()
}
fn open_tree(&self, name: &str) -> Result<usize> {
let mut trees = self.trees.write().unwrap();
if let Some(i) = trees.1.get(name) {
Ok(*i)
} else {
let tree = self.db.create_database(Some(name))?;
let i = trees.0.len();
trees.0.push(tree);
trees.1.insert(name.to_string(), i);
Ok(i)
}
}
fn list_trees(&self) -> Result<Vec<String>> {
let tree0 = match self.db.open_database::<heed::types::Str, ByteSlice>(None)? {
Some(x) => x,
None => return Ok(vec![]),
};
let mut ret = vec![];
let tx = self.db.read_txn()?;
for item in tree0.iter(&tx)? {
let (tree_name, _) = item?;
ret.push(tree_name.to_string());
}
drop(tx);
let mut ret2 = vec![];
for tree_name in ret {
if self
.db
.open_database::<ByteSlice, ByteSlice>(Some(&tree_name))?
.is_some()
{
ret2.push(tree_name);
}
}
Ok(ret2)
}
// ----
fn get(&self, tree: usize, key: &[u8]) -> Result<Option<Value>> {
let tree = self.get_tree(tree)?;
let tx = self.db.read_txn()?;
let val = tree.get(&tx, key)?;
match val {
None => Ok(None),
Some(v) => Ok(Some(v.to_vec())),
}
}
fn len(&self, tree: usize) -> Result<usize> {
let tree = self.get_tree(tree)?;
let tx = self.db.read_txn()?;
Ok(tree.len(&tx)?.try_into().unwrap())
}
fn insert(&self, tree: usize, key: &[u8], value: &[u8]) -> Result<Option<Value>> {
let tree = self.get_tree(tree)?;
let mut tx = self.db.write_txn()?;
let old_val = tree.get(&tx, key)?.map(Vec::from);
tree.put(&mut tx, key, value)?;
tx.commit()?;
Ok(old_val)
}
fn remove(&self, tree: usize, key: &[u8]) -> Result<Option<Value>> {
let tree = self.get_tree(tree)?;
let mut tx = self.db.write_txn()?;
let old_val = tree.get(&tx, key)?.map(Vec::from);
tree.delete(&mut tx, key)?;
tx.commit()?;
Ok(old_val)
}
fn clear(&self, tree: usize) -> Result<()> {
let tree = self.get_tree(tree)?;
let mut tx = self.db.write_txn()?;
tree.clear(&mut tx)?;
tx.commit()?;
Ok(())
}
fn iter(&self, tree: usize) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
let tx = self.db.read_txn()?;
TxAndIterator::make(tx, |tx| Ok(tree.iter(tx)?))
}
fn iter_rev(&self, tree: usize) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
let tx = self.db.read_txn()?;
TxAndIterator::make(tx, |tx| Ok(tree.rev_iter(tx)?))
}
fn range<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
let tx = self.db.read_txn()?;
TxAndIterator::make(tx, |tx| Ok(tree.range(tx, &(low, high))?))
}
fn range_rev<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
let tx = self.db.read_txn()?;
TxAndIterator::make(tx, |tx| Ok(tree.rev_range(tx, &(low, high))?))
}
// ----
fn transaction(&self, f: &dyn ITxFn) -> TxResult<(), ()> {
let trees = self.trees.read().unwrap();
let mut tx = LmdbTx {
trees: &trees.0[..],
tx: self
.db
.write_txn()
.map_err(Error::from)
.map_err(TxError::Db)?,
};
let res = f.try_on(&mut tx);
match res {
TxFnResult::Ok => {
tx.tx.commit().map_err(Error::from).map_err(TxError::Db)?;
Ok(())
}
TxFnResult::Abort => {
tx.tx.abort().map_err(Error::from).map_err(TxError::Db)?;
Err(TxError::Abort(()))
}
TxFnResult::DbErr => {
tx.tx.abort().map_err(Error::from).map_err(TxError::Db)?;
Err(TxError::Db(Error(
"(this message will be discarded)".into(),
)))
}
}
}
}
// ----
struct LmdbTx<'a> {
trees: &'a [Database],
tx: RwTxn<'a, 'a>,
}
impl<'a> LmdbTx<'a> {
fn get_tree(&self, i: usize) -> TxOpResult<&Database> {
self.trees.get(i).ok_or_else(|| {
TxOpError(Error(
"invalid tree id (it might have been openned after the transaction started)".into(),
))
})
}
}
impl<'a> ITx for LmdbTx<'a> {
fn get(&self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>> {
let tree = self.get_tree(tree)?;
match tree.get(&self.tx, key)? {
Some(v) => Ok(Some(v.to_vec())),
None => Ok(None),
}
}
fn len(&self, _tree: usize) -> TxOpResult<usize> {
unimplemented!(".len() in transaction not supported with LMDB backend")
}
fn insert(&mut self, tree: usize, key: &[u8], value: &[u8]) -> TxOpResult<Option<Value>> {
let tree = *self.get_tree(tree)?;
let old_val = tree.get(&self.tx, key)?.map(Vec::from);
tree.put(&mut self.tx, key, value)?;
Ok(old_val)
}
fn remove(&mut self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>> {
let tree = *self.get_tree(tree)?;
let old_val = tree.get(&self.tx, key)?.map(Vec::from);
tree.delete(&mut self.tx, key)?;
Ok(old_val)
}
fn iter(&self, _tree: usize) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with LMDB backend");
}
fn iter_rev(&self, _tree: usize) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with LMDB backend");
}
fn range<'r>(
&self,
_tree: usize,
_low: Bound<&'r [u8]>,
_high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with LMDB backend");
}
fn range_rev<'r>(
&self,
_tree: usize,
_low: Bound<&'r [u8]>,
_high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with LMDB backend");
}
}
// ----
type IteratorItem<'a> = heed::Result<(
<ByteSlice as BytesDecode<'a>>::DItem,
<ByteSlice as BytesDecode<'a>>::DItem,
)>;
struct TxAndIterator<'a, I>
where
I: Iterator<Item = IteratorItem<'a>> + 'a,
{
tx: RoTxn<'a>,
iter: Option<I>,
}
impl<'a, I> TxAndIterator<'a, I>
where
I: Iterator<Item = IteratorItem<'a>> + 'a,
{
fn make<F>(tx: RoTxn<'a>, iterfun: F) -> Result<ValueIter<'a>>
where
F: FnOnce(&'a RoTxn<'a>) -> Result<I>,
{
let mut res = TxAndIterator { tx, iter: None };
let tx = unsafe { NonNull::from(&res.tx).as_ref() };
res.iter = Some(iterfun(tx)?);
Ok(Box::new(res))
}
}
impl<'a, I> Drop for TxAndIterator<'a, I>
where
I: Iterator<Item = IteratorItem<'a>> + 'a,
{
fn drop(&mut self) {
drop(self.iter.take());
}
}
impl<'a, I> Iterator for TxAndIterator<'a, I>
where
I: Iterator<Item = IteratorItem<'a>> + 'a,
{
type Item = Result<(Value, Value)>;
fn next(&mut self) -> Option<Self::Item> {
match self.iter.as_mut().unwrap().next() {
None => None,
Some(Err(e)) => Some(Err(e.into())),
Some(Ok((k, v))) => Some(Ok((k.to_vec(), v.to_vec()))),
}
}
}
// ----
#[cfg(target_pointer_width = "64")]
pub fn recommended_map_size() -> usize {
1usize << 40
}
#[cfg(target_pointer_width = "32")]
pub fn recommended_map_size() -> usize {
use log::warn;
warn!("LMDB is not recommended on 32-bit systems, database size will be limited");
1usize << 30
}

266
src/db/sled_adapter.rs Normal file
View File

@ -0,0 +1,266 @@
use core::ops::Bound;
use std::cell::Cell;
use std::collections::HashMap;
use std::sync::{Arc, RwLock};
use sled::transaction::{
ConflictableTransactionError, TransactionError, Transactional, TransactionalTree,
UnabortableTransactionError,
};
use crate::{
Db, Error, IDb, ITx, ITxFn, Result, TxError, TxFnResult, TxOpError, TxOpResult, TxResult,
TxValueIter, Value, ValueIter,
};
pub use sled;
// -- err
impl From<sled::Error> for Error {
fn from(e: sled::Error) -> Error {
Error(format!("Sled: {}", e).into())
}
}
impl From<sled::Error> for TxOpError {
fn from(e: sled::Error) -> TxOpError {
TxOpError(e.into())
}
}
// -- db
pub struct SledDb {
db: sled::Db,
trees: RwLock<(Vec<sled::Tree>, HashMap<String, usize>)>,
}
impl SledDb {
pub fn init(db: sled::Db) -> Db {
let s = Self {
db,
trees: RwLock::new((Vec::new(), HashMap::new())),
};
Db(Arc::new(s))
}
fn get_tree(&self, i: usize) -> Result<sled::Tree> {
self.trees
.read()
.unwrap()
.0
.get(i)
.cloned()
.ok_or_else(|| Error("invalid tree id".into()))
}
}
impl IDb for SledDb {
fn engine(&self) -> String {
"Sled".into()
}
fn open_tree(&self, name: &str) -> Result<usize> {
let mut trees = self.trees.write().unwrap();
if let Some(i) = trees.1.get(name) {
Ok(*i)
} else {
let tree = self.db.open_tree(name)?;
let i = trees.0.len();
trees.0.push(tree);
trees.1.insert(name.to_string(), i);
Ok(i)
}
}
fn list_trees(&self) -> Result<Vec<String>> {
let mut trees = vec![];
for name in self.db.tree_names() {
let name = std::str::from_utf8(&name)
.map_err(|e| Error(format!("{}", e).into()))?
.to_string();
if name != "__sled__default" {
trees.push(name);
}
}
Ok(trees)
}
// ----
fn get(&self, tree: usize, key: &[u8]) -> Result<Option<Value>> {
let tree = self.get_tree(tree)?;
let val = tree.get(key)?;
Ok(val.map(|x| x.to_vec()))
}
fn len(&self, tree: usize) -> Result<usize> {
let tree = self.get_tree(tree)?;
Ok(tree.len())
}
fn insert(&self, tree: usize, key: &[u8], value: &[u8]) -> Result<Option<Value>> {
let tree = self.get_tree(tree)?;
let old_val = tree.insert(key, value)?;
Ok(old_val.map(|x| x.to_vec()))
}
fn remove(&self, tree: usize, key: &[u8]) -> Result<Option<Value>> {
let tree = self.get_tree(tree)?;
let old_val = tree.remove(key)?;
Ok(old_val.map(|x| x.to_vec()))
}
fn clear(&self, tree: usize) -> Result<()> {
let tree = self.get_tree(tree)?;
tree.clear()?;
Ok(())
}
fn iter(&self, tree: usize) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
Ok(Box::new(tree.iter().map(|v| {
v.map(|(x, y)| (x.to_vec(), y.to_vec())).map_err(Into::into)
})))
}
fn iter_rev(&self, tree: usize) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
Ok(Box::new(tree.iter().rev().map(|v| {
v.map(|(x, y)| (x.to_vec(), y.to_vec())).map_err(Into::into)
})))
}
fn range<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
Ok(Box::new(tree.range::<&'r [u8], _>((low, high)).map(|v| {
v.map(|(x, y)| (x.to_vec(), y.to_vec())).map_err(Into::into)
})))
}
fn range_rev<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>> {
let tree = self.get_tree(tree)?;
Ok(Box::new(tree.range::<&'r [u8], _>((low, high)).rev().map(
|v| v.map(|(x, y)| (x.to_vec(), y.to_vec())).map_err(Into::into),
)))
}
// ----
fn transaction(&self, f: &dyn ITxFn) -> TxResult<(), ()> {
let trees = self.trees.read().unwrap();
let res = trees.0.transaction(|txtrees| {
let mut tx = SledTx {
trees: txtrees,
err: Cell::new(None),
};
match f.try_on(&mut tx) {
TxFnResult::Ok => {
assert!(tx.err.into_inner().is_none());
Ok(())
}
TxFnResult::Abort => {
assert!(tx.err.into_inner().is_none());
Err(ConflictableTransactionError::Abort(()))
}
TxFnResult::DbErr => {
let e = tx.err.into_inner().expect("No DB error");
Err(e.into())
}
}
});
match res {
Ok(()) => Ok(()),
Err(TransactionError::Abort(())) => Err(TxError::Abort(())),
Err(TransactionError::Storage(s)) => Err(TxError::Db(s.into())),
}
}
}
// ----
struct SledTx<'a> {
trees: &'a [TransactionalTree],
err: Cell<Option<UnabortableTransactionError>>,
}
impl<'a> SledTx<'a> {
fn get_tree(&self, i: usize) -> TxOpResult<&TransactionalTree> {
self.trees.get(i).ok_or_else(|| {
TxOpError(Error(
"invalid tree id (it might have been openned after the transaction started)".into(),
))
})
}
fn save_error<R>(
&self,
v: std::result::Result<R, UnabortableTransactionError>,
) -> TxOpResult<R> {
match v {
Ok(x) => Ok(x),
Err(e) => {
let txt = format!("{}", e);
self.err.set(Some(e));
Err(TxOpError(Error(txt.into())))
}
}
}
}
impl<'a> ITx for SledTx<'a> {
fn get(&self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>> {
let tree = self.get_tree(tree)?;
let tmp = self.save_error(tree.get(key))?;
Ok(tmp.map(|x| x.to_vec()))
}
fn len(&self, _tree: usize) -> TxOpResult<usize> {
unimplemented!(".len() in transaction not supported with Sled backend")
}
fn insert(&mut self, tree: usize, key: &[u8], value: &[u8]) -> TxOpResult<Option<Value>> {
let tree = self.get_tree(tree)?;
let old_val = self.save_error(tree.insert(key, value))?;
Ok(old_val.map(|x| x.to_vec()))
}
fn remove(&mut self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>> {
let tree = self.get_tree(tree)?;
let old_val = self.save_error(tree.remove(key))?;
Ok(old_val.map(|x| x.to_vec()))
}
fn iter(&self, _tree: usize) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with Sled backend");
}
fn iter_rev(&self, _tree: usize) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with Sled backend");
}
fn range<'r>(
&self,
_tree: usize,
_low: Bound<&'r [u8]>,
_high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with Sled backend");
}
fn range_rev<'r>(
&self,
_tree: usize,
_low: Bound<&'r [u8]>,
_high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>> {
unimplemented!("Iterators in transactions not supported with Sled backend");
}
}

510
src/db/sqlite_adapter.rs Normal file
View File

@ -0,0 +1,510 @@
use core::ops::Bound;
use std::borrow::BorrowMut;
use std::marker::PhantomPinned;
use std::pin::Pin;
use std::ptr::NonNull;
use std::sync::{Arc, Mutex, MutexGuard};
use log::trace;
use rusqlite::{params, Connection, Rows, Statement, Transaction};
use crate::{
Db, Error, IDb, ITx, ITxFn, Result, TxError, TxFnResult, TxOpError, TxOpResult, TxResult,
TxValueIter, Value, ValueIter,
};
pub use rusqlite;
// --- err
impl From<rusqlite::Error> for Error {
fn from(e: rusqlite::Error) -> Error {
Error(format!("Sqlite: {}", e).into())
}
}
impl From<rusqlite::Error> for TxOpError {
fn from(e: rusqlite::Error) -> TxOpError {
TxOpError(e.into())
}
}
// -- db
pub struct SqliteDb(Mutex<SqliteDbInner>);
struct SqliteDbInner {
db: Connection,
trees: Vec<String>,
}
impl SqliteDb {
pub fn init(db: rusqlite::Connection) -> Db {
let s = Self(Mutex::new(SqliteDbInner {
db,
trees: Vec::new(),
}));
Db(Arc::new(s))
}
}
impl SqliteDbInner {
fn get_tree(&self, i: usize) -> Result<&'_ str> {
self.trees
.get(i)
.map(String::as_str)
.ok_or_else(|| Error("invalid tree id".into()))
}
fn internal_get(&self, tree: &str, key: &[u8]) -> Result<Option<Value>> {
let mut stmt = self
.db
.prepare(&format!("SELECT v FROM {} WHERE k = ?1", tree))?;
let mut res_iter = stmt.query([key])?;
match res_iter.next()? {
None => Ok(None),
Some(v) => Ok(Some(v.get::<_, Vec<u8>>(0)?)),
}
}
}
impl IDb for SqliteDb {
fn engine(&self) -> String {
format!("sqlite3 v{} (using rusqlite crate)", rusqlite::version())
}
fn open_tree(&self, name: &str) -> Result<usize> {
let name = format!("tree_{}", name.replace(':', "_COLON_"));
let mut this = self.0.lock().unwrap();
if let Some(i) = this.trees.iter().position(|x| x == &name) {
Ok(i)
} else {
trace!("create table {}", name);
this.db.execute(
&format!(
"CREATE TABLE IF NOT EXISTS {} (
k BLOB PRIMARY KEY,
v BLOB
)",
name
),
[],
)?;
trace!("table created: {}, unlocking", name);
let i = this.trees.len();
this.trees.push(name.to_string());
Ok(i)
}
}
fn list_trees(&self) -> Result<Vec<String>> {
let mut trees = vec![];
trace!("list_trees: lock db");
let this = self.0.lock().unwrap();
trace!("list_trees: lock acquired");
let mut stmt = this.db.prepare(
"SELECT name FROM sqlite_schema WHERE type = 'table' AND name LIKE 'tree_%'",
)?;
let mut rows = stmt.query([])?;
while let Some(row) = rows.next()? {
let name = row.get::<_, String>(0)?;
let name = name.replace("_COLON_", ":");
let name = name.strip_prefix("tree_").unwrap().to_string();
trees.push(name);
}
Ok(trees)
}
// ----
fn get(&self, tree: usize, key: &[u8]) -> Result<Option<Value>> {
trace!("get {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("get {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
this.internal_get(tree, key)
}
fn len(&self, tree: usize) -> Result<usize> {
trace!("len {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("len {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
let mut stmt = this.db.prepare(&format!("SELECT COUNT(*) FROM {}", tree))?;
let mut res_iter = stmt.query([])?;
match res_iter.next()? {
None => Ok(0),
Some(v) => Ok(v.get::<_, usize>(0)?),
}
}
fn insert(&self, tree: usize, key: &[u8], value: &[u8]) -> Result<Option<Value>> {
trace!("insert {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("insert {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
let old_val = this.internal_get(tree, key)?;
let sql = match &old_val {
Some(_) => format!("UPDATE {} SET v = ?2 WHERE k = ?1", tree),
None => format!("INSERT INTO {} (k, v) VALUES (?1, ?2)", tree),
};
let n = this.db.execute(&sql, params![key, value])?;
assert_eq!(n, 1);
Ok(old_val)
}
fn remove(&self, tree: usize, key: &[u8]) -> Result<Option<Value>> {
trace!("remove {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("remove {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
let old_val = this.internal_get(tree, key)?;
if old_val.is_some() {
let n = this
.db
.execute(&format!("DELETE FROM {} WHERE k = ?1", tree), params![key])?;
assert_eq!(n, 1);
}
Ok(old_val)
}
fn clear(&self, tree: usize) -> Result<()> {
trace!("clear {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("clear {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
this.db.execute(&format!("DELETE FROM {}", tree), [])?;
Ok(())
}
fn iter(&self, tree: usize) -> Result<ValueIter<'_>> {
trace!("iter {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("iter {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
let sql = format!("SELECT k, v FROM {} ORDER BY k ASC", tree);
DbValueIterator::make(this, &sql, [])
}
fn iter_rev(&self, tree: usize) -> Result<ValueIter<'_>> {
trace!("iter_rev {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("iter_rev {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
let sql = format!("SELECT k, v FROM {} ORDER BY k DESC", tree);
DbValueIterator::make(this, &sql, [])
}
fn range<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>> {
trace!("range {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("range {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
let (bounds_sql, params) = bounds_sql(low, high);
let sql = format!("SELECT k, v FROM {} {} ORDER BY k ASC", tree, bounds_sql);
let params = params
.iter()
.map(|x| x as &dyn rusqlite::ToSql)
.collect::<Vec<_>>();
DbValueIterator::make::<&[&dyn rusqlite::ToSql]>(this, &sql, params.as_ref())
}
fn range_rev<'r>(
&self,
tree: usize,
low: Bound<&'r [u8]>,
high: Bound<&'r [u8]>,
) -> Result<ValueIter<'_>> {
trace!("range_rev {}: lock db", tree);
let this = self.0.lock().unwrap();
trace!("range_rev {}: lock acquired", tree);
let tree = this.get_tree(tree)?;
let (bounds_sql, params) = bounds_sql(low, high);
let sql = format!("SELECT k, v FROM {} {} ORDER BY k DESC", tree, bounds_sql);
let params = params
.iter()
.map(|x| x as &dyn rusqlite::ToSql)
.collect::<Vec<_>>();
DbValueIterator::make::<&[&dyn rusqlite::ToSql]>(this, &sql, params.as_ref())
}
// ----
fn transaction(&self, f: &dyn ITxFn) -> TxResult<(), ()> {
trace!("transaction: lock db");
let mut this = self.0.lock().unwrap();
trace!("transaction: lock acquired");
let this_mut_ref: &mut SqliteDbInner = this.borrow_mut();
let mut tx = SqliteTx {
tx: this_mut_ref
.db
.transaction()
.map_err(Error::from)
.map_err(TxError::Db)?,
trees: &this_mut_ref.trees,
};
let res = match f.try_on(&mut tx) {
TxFnResult::Ok => {
tx.tx.commit().map_err(Error::from).map_err(TxError::Db)?;
Ok(())
}
TxFnResult::Abort => {
tx.tx.rollback().map_err(Error::from).map_err(TxError::Db)?;
Err(TxError::Abort(()))
}
TxFnResult::DbErr => {
tx.tx.rollback().map_err(Error::from).map_err(TxError::Db)?;
Err(TxError::Db(Error(
"(this message will be discarded)".into(),
)))
}
};
trace!("transaction done");
res
}
}
// ----
struct SqliteTx<'a> {
tx: Transaction<'a>,
trees: &'a [String],
}
impl<'a> SqliteTx<'a> {
fn get_tree(&self, i: usize) -> TxOpResult<&'_ str> {
self.trees.get(i).map(String::as_ref).ok_or_else(|| {
TxOpError(Error(
"invalid tree id (it might have been openned after the transaction started)".into(),
))
})
}
fn internal_get(&self, tree: &str, key: &[u8]) -> TxOpResult<Option<Value>> {
let mut stmt = self
.tx
.prepare(&format!("SELECT v FROM {} WHERE k = ?1", tree))?;
let mut res_iter = stmt.query([key])?;
match res_iter.next()? {
None => Ok(None),
Some(v) => Ok(Some(v.get::<_, Vec<u8>>(0)?)),
}
}
}
impl<'a> ITx for SqliteTx<'a> {
fn get(&self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>> {
let tree = self.get_tree(tree)?;
self.internal_get(tree, key)
}
fn len(&self, tree: usize) -> TxOpResult<usize> {
let tree = self.get_tree(tree)?;
let mut stmt = self.tx.prepare(&format!("SELECT COUNT(*) FROM {}", tree))?;
let mut res_iter = stmt.query([])?;
match res_iter.next()? {
None => Ok(0),
Some(v) => Ok(v.get::<_, usize>(0)?),
}
}
fn insert(&mut self, tree: usize, key: &[u8], value: &[u8]) -> TxOpResult<Option<Value>> {
let tree = self.get_tree(tree)?;
let old_val = self.internal_get(tree, key)?;
let sql = match &old_val {
Some(_) => format!("UPDATE {} SET v = ?2 WHERE k = ?1", tree),
None => format!("INSERT INTO {} (k, v) VALUES (?1, ?2)", tree),
};
let n = self.tx.execute(&sql, params![key, value])?;
assert_eq!(n, 1);
Ok(old_val)
}
fn remove(&mut self, tree: usize, key: &[u8]) -> TxOpResult<Option<Value>> {
let tree = self.get_tree(tree)?;
let old_val = self.internal_get(tree, key)?;
if old_val.is_some() {
let n = self
.tx
.execute(&format!("DELETE FROM {} WHERE k = ?1", tree), params![key])?;
assert_eq!(n, 1);
}
Ok(old_val)
}
fn iter(&self, _tree: usize) -> TxOpResult<TxValueIter<'_>> {
unimplemented!();
}
fn iter_rev(&self, _tree: usize) -> TxOpResult<TxValueIter<'_>> {
unimplemented!();
}
fn range<'r>(
&self,
_tree: usize,
_low: Bound<&'r [u8]>,
_high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>> {
unimplemented!();
}
fn range_rev<'r>(
&self,
_tree: usize,
_low: Bound<&'r [u8]>,
_high: Bound<&'r [u8]>,
) -> TxOpResult<TxValueIter<'_>> {
unimplemented!();
}
}
// ----
struct DbValueIterator<'a> {
db: MutexGuard<'a, SqliteDbInner>,
stmt: Option<Statement<'a>>,
iter: Option<Rows<'a>>,
_pin: PhantomPinned,
}
impl<'a> DbValueIterator<'a> {
fn make<P: rusqlite::Params>(
db: MutexGuard<'a, SqliteDbInner>,
sql: &str,
args: P,
) -> Result<ValueIter<'a>> {
let res = DbValueIterator {
db,
stmt: None,
iter: None,
_pin: PhantomPinned,
};
let mut boxed = Box::pin(res);
trace!("make iterator with sql: {}", sql);
unsafe {
let db = NonNull::from(&boxed.db);
let stmt = db.as_ref().db.prepare(sql)?;
let mut_ref: Pin<&mut DbValueIterator<'a>> = Pin::as_mut(&mut boxed);
Pin::get_unchecked_mut(mut_ref).stmt = Some(stmt);
let mut stmt = NonNull::from(&boxed.stmt);
let iter = stmt.as_mut().as_mut().unwrap().query(args)?;
let mut_ref: Pin<&mut DbValueIterator<'a>> = Pin::as_mut(&mut boxed);
Pin::get_unchecked_mut(mut_ref).iter = Some(iter);
}
Ok(Box::new(DbValueIteratorPin(boxed)))
}
}
impl<'a> Drop for DbValueIterator<'a> {
fn drop(&mut self) {
trace!("drop iter");
drop(self.iter.take());
drop(self.stmt.take());
}
}
struct DbValueIteratorPin<'a>(Pin<Box<DbValueIterator<'a>>>);
impl<'a> Iterator for DbValueIteratorPin<'a> {
type Item = Result<(Value, Value)>;
fn next(&mut self) -> Option<Self::Item> {
let next = unsafe {
let mut_ref: Pin<&mut DbValueIterator<'a>> = Pin::as_mut(&mut self.0);
Pin::get_unchecked_mut(mut_ref).iter.as_mut()?.next()
};
let row = match next {
Err(e) => return Some(Err(e.into())),
Ok(None) => return None,
Ok(Some(r)) => r,
};
let k = match row.get::<_, Vec<u8>>(0) {
Err(e) => return Some(Err(e.into())),
Ok(x) => x,
};
let v = match row.get::<_, Vec<u8>>(1) {
Err(e) => return Some(Err(e.into())),
Ok(y) => y,
};
Some(Ok((k, v)))
}
}
// ----
fn bounds_sql<'r>(low: Bound<&'r [u8]>, high: Bound<&'r [u8]>) -> (String, Vec<Vec<u8>>) {
let mut sql = String::new();
let mut params: Vec<Vec<u8>> = vec![];
match low {
Bound::Included(b) => {
sql.push_str(" WHERE k >= ?1");
params.push(b.to_vec());
}
Bound::Excluded(b) => {
sql.push_str(" WHERE k > ?1");
params.push(b.to_vec());
}
Bound::Unbounded => (),
};
match high {
Bound::Included(b) => {
if !params.is_empty() {
sql.push_str(" AND k <= ?2");
} else {
sql.push_str(" WHERE k <= ?1");
}
params.push(b.to_vec());
}
Bound::Excluded(b) => {
if !params.is_empty() {
sql.push_str(" AND k < ?2");
} else {
sql.push_str(" WHERE k < ?1");
}
params.push(b.to_vec());
}
Bound::Unbounded => (),
}
(sql, params)
}

106
src/db/test.rs Normal file
View File

@ -0,0 +1,106 @@
use crate::*;
use crate::lmdb_adapter::LmdbDb;
use crate::sled_adapter::SledDb;
use crate::sqlite_adapter::SqliteDb;
fn test_suite(db: Db) {
let tree = db.open_tree("tree").unwrap();
let ka: &[u8] = &b"test"[..];
let kb: &[u8] = &b"zwello"[..];
let kint: &[u8] = &b"tz"[..];
let va: &[u8] = &b"plop"[..];
let vb: &[u8] = &b"plip"[..];
let vc: &[u8] = &b"plup"[..];
assert!(tree.insert(ka, va).unwrap().is_none());
assert_eq!(tree.get(ka).unwrap().unwrap(), va);
let res = db.transaction::<_, (), _>(|mut tx| {
assert_eq!(tx.get(&tree, ka).unwrap().unwrap(), va);
assert_eq!(tx.insert(&tree, ka, vb).unwrap().unwrap(), va);
assert_eq!(tx.get(&tree, ka).unwrap().unwrap(), vb);
tx.commit(12)
});
assert!(matches!(res, Ok(12)));
assert_eq!(tree.get(ka).unwrap().unwrap(), vb);
let res = db.transaction::<(), _, _>(|mut tx| {
assert_eq!(tx.get(&tree, ka).unwrap().unwrap(), vb);
assert_eq!(tx.insert(&tree, ka, vc).unwrap().unwrap(), vb);
assert_eq!(tx.get(&tree, ka).unwrap().unwrap(), vc);
tx.abort(42)
});
assert!(matches!(res, Err(TxError::Abort(42))));
assert_eq!(tree.get(ka).unwrap().unwrap(), vb);
let mut iter = tree.iter().unwrap();
let next = iter.next().unwrap().unwrap();
assert_eq!((next.0.as_ref(), next.1.as_ref()), (ka, vb));
assert!(iter.next().is_none());
drop(iter);
assert!(tree.insert(kb, vc).unwrap().is_none());
assert_eq!(tree.get(kb).unwrap().unwrap(), vc);
let mut iter = tree.iter().unwrap();
let next = iter.next().unwrap().unwrap();
assert_eq!((next.0.as_ref(), next.1.as_ref()), (ka, vb));
let next = iter.next().unwrap().unwrap();
assert_eq!((next.0.as_ref(), next.1.as_ref()), (kb, vc));
assert!(iter.next().is_none());
drop(iter);
let mut iter = tree.range(kint..).unwrap();
let next = iter.next().unwrap().unwrap();
assert_eq!((next.0.as_ref(), next.1.as_ref()), (kb, vc));
assert!(iter.next().is_none());
drop(iter);
let mut iter = tree.range_rev(..kint).unwrap();
let next = iter.next().unwrap().unwrap();
assert_eq!((next.0.as_ref(), next.1.as_ref()), (ka, vb));
assert!(iter.next().is_none());
drop(iter);
let mut iter = tree.iter_rev().unwrap();
let next = iter.next().unwrap().unwrap();
assert_eq!((next.0.as_ref(), next.1.as_ref()), (kb, vc));
let next = iter.next().unwrap().unwrap();
assert_eq!((next.0.as_ref(), next.1.as_ref()), (ka, vb));
assert!(iter.next().is_none());
drop(iter);
}
#[test]
fn test_lmdb_db() {
let path = mktemp::Temp::new_dir().unwrap();
let db = heed::EnvOpenOptions::new()
.max_dbs(100)
.open(&path)
.unwrap();
let db = LmdbDb::init(db);
test_suite(db);
drop(path);
}
#[test]
fn test_sled_db() {
let path = mktemp::Temp::new_dir().unwrap();
let db = SledDb::init(sled::open(path.to_path_buf()).unwrap());
test_suite(db);
drop(path);
}
#[test]
fn test_sqlite_db() {
let db = SqliteDb::init(rusqlite::Connection::open_in_memory().unwrap());
test_suite(db);
}

View File

@ -21,6 +21,7 @@ path = "tests/lib.rs"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
garage_db = { version = "0.8.0", path = "../db" }
garage_api = { version = "0.7.0", path = "../api" }
garage_model = { version = "0.7.0", path = "../model" }
garage_rpc = { version = "0.7.0", path = "../rpc" }
@ -29,6 +30,7 @@ garage_util = { version = "0.7.0", path = "../util" }
garage_web = { version = "0.7.0", path = "../web" }
bytes = "1.0"
bytesize = "1.1"
hex = "0.4"
tracing = { version = "0.1.30", features = ["log-always"] }
pretty_env_logger = "0.4"
@ -36,8 +38,6 @@ rand = "0.8"
async-trait = "0.1.7"
sodiumoxide = { version = "0.2.5-0", package = "kuska-sodiumoxide" }
sled = "0.34"
rmp-serde = "0.15"
serde = { version = "1.0", default-features = false, features = ["derive", "rc"] }
serde_bytes = "0.11"

View File

@ -24,11 +24,12 @@ use garage_model::migrate::Migrate;
use garage_model::permission::*;
use crate::cli::*;
use crate::repair::Repair;
use crate::repair::online::OnlineRepair;
pub const ADMIN_RPC_PATH: &str = "garage/admin_rpc.rs/Rpc";
#[derive(Debug, Serialize, Deserialize)]
#[allow(clippy::large_enum_variant)]
pub enum AdminRpc {
BucketOperation(BucketOperation),
KeyOperation(KeyOperation),
@ -39,7 +40,11 @@ pub enum AdminRpc {
// Replies
Ok(String),
BucketList(Vec<Bucket>),
BucketInfo(Bucket, HashMap<String, Key>),
BucketInfo {
bucket: Bucket,
relevant_keys: HashMap<String, Key>,
counters: HashMap<String, i64>,
},
KeyList(Vec<(String, String)>),
KeyInfo(Key, HashMap<Uuid, Bucket>),
}
@ -72,6 +77,7 @@ impl AdminRpcHandler {
BucketOperation::Allow(query) => self.handle_bucket_allow(query).await,
BucketOperation::Deny(query) => self.handle_bucket_deny(query).await,
BucketOperation::Website(query) => self.handle_bucket_website(query).await,
BucketOperation::SetQuotas(query) => self.handle_bucket_set_quotas(query).await,
}
}
@ -87,6 +93,7 @@ impl AdminRpcHandler {
EnumerationOrder::Forward,
)
.await?;
Ok(AdminRpc::BucketList(buckets))
}
@ -104,6 +111,15 @@ impl AdminRpcHandler {
.get_existing_bucket(bucket_id)
.await?;
let counters = self
.garage
.object_counter_table
.table
.get(&bucket_id, &EmptyKey)
.await?
.map(|x| x.filtered_values(&self.garage.system.ring.borrow()))
.unwrap_or_default();
let mut relevant_keys = HashMap::new();
for (k, _) in bucket
.state
@ -139,7 +155,11 @@ impl AdminRpcHandler {
}
}
Ok(AdminRpc::BucketInfo(bucket, relevant_keys))
Ok(AdminRpc::BucketInfo {
bucket,
relevant_keys,
counters,
})
}
#[allow(clippy::ptr_arg)]
@ -431,6 +451,60 @@ impl AdminRpcHandler {
Ok(AdminRpc::Ok(msg))
}
async fn handle_bucket_set_quotas(&self, query: &SetQuotasOpt) -> Result<AdminRpc, Error> {
let bucket_id = self
.garage
.bucket_helper()
.resolve_global_bucket_name(&query.bucket)
.await?
.ok_or_bad_request("Bucket not found")?;
let mut bucket = self
.garage
.bucket_helper()
.get_existing_bucket(bucket_id)
.await?;
let bucket_state = bucket.state.as_option_mut().unwrap();
if query.max_size.is_none() && query.max_objects.is_none() {
return Err(Error::BadRequest(
"You must specify either --max-size or --max-objects (or both) for this command to do something.".to_string(),
));
}
let mut quotas = bucket_state.quotas.get().clone();
match query.max_size.as_ref().map(String::as_ref) {
Some("none") => quotas.max_size = None,
Some(v) => {
let bs = v
.parse::<bytesize::ByteSize>()
.ok_or_bad_request(format!("Invalid size specified: {}", v))?;
quotas.max_size = Some(bs.as_u64());
}
_ => (),
}
match query.max_objects.as_ref().map(String::as_ref) {
Some("none") => quotas.max_objects = None,
Some(v) => {
let mo = v
.parse::<u64>()
.ok_or_bad_request(format!("Invalid number specified: {}", v))?;
quotas.max_objects = Some(mo);
}
_ => (),
}
bucket_state.quotas.update(quotas);
self.garage.bucket_table.insert(&bucket).await?;
Ok(AdminRpc::Ok(format!(
"Quotas updated for {}",
&query.bucket
)))
}
async fn handle_key_cmd(&self, cmd: &KeyOperation) -> Result<AdminRpc, Error> {
match cmd {
KeyOperation::List => self.handle_list_keys().await,
@ -619,7 +693,7 @@ impl AdminRpcHandler {
)))
}
} else {
let repair = Repair {
let repair = OnlineRepair {
garage: self.garage.clone(),
};
self.garage
@ -660,11 +734,11 @@ impl AdminRpcHandler {
}
Ok(AdminRpc::Ok(ret))
} else {
Ok(AdminRpc::Ok(self.gather_stats_local(opt)))
Ok(AdminRpc::Ok(self.gather_stats_local(opt)?))
}
}
fn gather_stats_local(&self, opt: StatsOpt) -> String {
fn gather_stats_local(&self, opt: StatsOpt) -> Result<String, Error> {
let mut ret = String::new();
writeln!(
&mut ret,
@ -672,6 +746,7 @@ impl AdminRpcHandler {
self.garage.system.garage_version(),
)
.unwrap();
writeln!(&mut ret, "\nDatabase engine: {}", self.garage.db.engine()).unwrap();
// Gather ring statistics
let ring = self.garage.system.ring.borrow().clone();
@ -689,59 +764,71 @@ impl AdminRpcHandler {
writeln!(&mut ret, " {:?} {}", n, c).unwrap();
}
self.gather_table_stats(&mut ret, &self.garage.bucket_table, &opt);
self.gather_table_stats(&mut ret, &self.garage.key_table, &opt);
self.gather_table_stats(&mut ret, &self.garage.object_table, &opt);
self.gather_table_stats(&mut ret, &self.garage.version_table, &opt);
self.gather_table_stats(&mut ret, &self.garage.block_ref_table, &opt);
self.gather_table_stats(&mut ret, &self.garage.bucket_table, &opt)?;
self.gather_table_stats(&mut ret, &self.garage.key_table, &opt)?;
self.gather_table_stats(&mut ret, &self.garage.object_table, &opt)?;
self.gather_table_stats(&mut ret, &self.garage.version_table, &opt)?;
self.gather_table_stats(&mut ret, &self.garage.block_ref_table, &opt)?;
writeln!(&mut ret, "\nBlock manager stats:").unwrap();
if opt.detailed {
writeln!(
&mut ret,
" number of RC entries (~= number of blocks): {}",
self.garage.block_manager.rc_len()
self.garage.block_manager.rc_len()?
)
.unwrap();
}
writeln!(
&mut ret,
" resync queue length: {}",
self.garage.block_manager.resync_queue_len()
self.garage.block_manager.resync_queue_len()?
)
.unwrap();
writeln!(
&mut ret,
" blocks with resync errors: {}",
self.garage.block_manager.resync_errors_len()
self.garage.block_manager.resync_errors_len()?
)
.unwrap();
ret
Ok(ret)
}
fn gather_table_stats<F, R>(&self, to: &mut String, t: &Arc<Table<F, R>>, opt: &StatsOpt)
fn gather_table_stats<F, R>(
&self,
to: &mut String,
t: &Arc<Table<F, R>>,
opt: &StatsOpt,
) -> Result<(), Error>
where
F: TableSchema + 'static,
R: TableReplication + 'static,
{
writeln!(to, "\nTable stats for {}", F::TABLE_NAME).unwrap();
if opt.detailed {
writeln!(to, " number of items: {}", t.data.store.len()).unwrap();
writeln!(
to,
" number of items: {}",
t.data.store.len().map_err(GarageError::from)?
)
.unwrap();
writeln!(
to,
" Merkle tree size: {}",
t.merkle_updater.merkle_tree_len()
t.merkle_updater.merkle_tree_len()?
)
.unwrap();
}
writeln!(
to,
" Merkle updater todo queue length: {}",
t.merkle_updater.todo_len()
t.merkle_updater.todo_len()?
)
.unwrap();
writeln!(to, " GC todo queue length: {}", t.data.gc_todo_len()).unwrap();
writeln!(to, " GC todo queue length: {}", t.data.gc_todo_len()?).unwrap();
Ok(())
}
}

View File

@ -169,8 +169,12 @@ pub async fn cmd_admin(
AdminRpc::BucketList(bl) => {
print_bucket_list(bl);
}
AdminRpc::BucketInfo(bucket, rk) => {
print_bucket_info(&bucket, &rk);
AdminRpc::BucketInfo {
bucket,
relevant_keys,
counters,
} => {
print_bucket_info(&bucket, &relevant_keys, &counters);
}
AdminRpc::KeyList(kl) => {
print_key_list(kl);

View File

@ -33,10 +33,15 @@ pub enum Command {
#[structopt(name = "migrate")]
Migrate(MigrateOpt),
/// Start repair of node data
/// Start repair of node data on remote node
#[structopt(name = "repair")]
Repair(RepairOpt),
/// Offline reparation of node data (these repairs must be run offline
/// directly on the server node)
#[structopt(name = "offline-repair")]
OfflineRepair(OfflineRepairOpt),
/// Gather node statistics
#[structopt(name = "stats")]
Stats(StatsOpt),
@ -175,6 +180,10 @@ pub enum BucketOperation {
/// Expose as website or not
#[structopt(name = "website")]
Website(WebsiteOpt),
/// Set the quotas for this bucket
#[structopt(name = "set-quotas")]
SetQuotas(SetQuotasOpt),
}
#[derive(Serialize, Deserialize, StructOpt, Debug)]
@ -261,6 +270,21 @@ pub struct PermBucketOpt {
pub bucket: String,
}
#[derive(Serialize, Deserialize, StructOpt, Debug)]
pub struct SetQuotasOpt {
/// Bucket name
pub bucket: String,
/// Set a maximum size for the bucket (specify a size e.g. in MiB or GiB,
/// or `none` for no size restriction)
#[structopt(long = "max-size")]
pub max_size: Option<String>,
/// Set a maximum number of objects for the bucket (or `none` for no restriction)
#[structopt(long = "max-objects")]
pub max_objects: Option<String>,
}
#[derive(Serialize, Deserialize, StructOpt, Debug)]
pub enum KeyOperation {
/// List keys
@ -405,6 +429,27 @@ pub enum RepairWhat {
},
}
#[derive(Serialize, Deserialize, StructOpt, Debug, Clone)]
pub struct OfflineRepairOpt {
/// Confirm the launch of the repair operation
#[structopt(long = "yes")]
pub yes: bool,
#[structopt(subcommand)]
pub what: OfflineRepairWhat,
}
#[derive(Serialize, Deserialize, StructOpt, Debug, Eq, PartialEq, Clone)]
pub enum OfflineRepairWhat {
/// Repair K2V item counters
#[cfg(feature = "k2v")]
#[structopt(name = "k2v_item_counters")]
K2VItemCounters,
/// Repair object counters
#[structopt(name = "object_counters")]
ObjectCounters,
}
#[derive(Serialize, Deserialize, StructOpt, Debug, Clone)]
pub struct StatsOpt {
/// Gather statistics from all nodes

View File

@ -7,6 +7,7 @@ use garage_util::formater::format_table;
use garage_model::bucket_table::*;
use garage_model::key_table::*;
use garage_model::s3::object_table::{BYTES, OBJECTS, UNFINISHED_UPLOADS};
pub fn print_bucket_list(bl: Vec<Bucket>) {
println!("List of buckets:");
@ -29,11 +30,12 @@ pub fn print_bucket_list(bl: Vec<Bucket>) {
[((k, n), _, _)] => format!("{}:{}", k, n),
s => format!("[{} local aliases]", s.len()),
};
table.push(format!(
"\t{}\t{}\t{}",
aliases.join(","),
local_aliases_n,
hex::encode(bucket.id)
hex::encode(bucket.id),
));
}
format_table(table);
@ -121,7 +123,11 @@ pub fn print_key_info(key: &Key, relevant_buckets: &HashMap<Uuid, Bucket>) {
}
}
pub fn print_bucket_info(bucket: &Bucket, relevant_keys: &HashMap<String, Key>) {
pub fn print_bucket_info(
bucket: &Bucket,
relevant_keys: &HashMap<String, Key>,
counters: &HashMap<String, i64>,
) {
let key_name = |k| {
relevant_keys
.get(k)
@ -133,7 +139,42 @@ pub fn print_bucket_info(bucket: &Bucket, relevant_keys: &HashMap<String, Key>)
match &bucket.state {
Deletable::Deleted => println!("Bucket is deleted."),
Deletable::Present(p) => {
println!("Website access: {}", p.website_config.get().is_some());
let size =
bytesize::ByteSize::b(counters.get(BYTES).cloned().unwrap_or_default() as u64);
println!(
"\nSize: {} ({})",
size.to_string_as(true),
size.to_string_as(false)
);
println!(
"Objects: {}",
counters.get(OBJECTS).cloned().unwrap_or_default()
);
println!(
"Unfinished multipart uploads: {}",
counters
.get(UNFINISHED_UPLOADS)
.cloned()
.unwrap_or_default()
);
println!("\nWebsite access: {}", p.website_config.get().is_some());
let quotas = p.quotas.get();
if quotas.max_size.is_some() || quotas.max_objects.is_some() {
println!("\nQuotas:");
if let Some(ms) = quotas.max_size {
let ms = bytesize::ByteSize::b(ms);
println!(
" maximum size: {} ({})",
ms.to_string_as(true),
ms.to_string_as(false)
);
}
if let Some(mo) = quotas.max_objects {
println!(" maximum number of objects: {}", mo);
}
}
println!("\nGlobal aliases:");
for (alias, _, active) in p.aliases.items().iter() {

View File

@ -61,17 +61,17 @@ async fn main() {
pretty_env_logger::init();
sodiumoxide::init().expect("Unable to init sodiumoxide");
// Abort on panic (same behavior as in Go)
std::panic::set_hook(Box::new(|panic_info| {
error!("{}", panic_info.to_string());
std::process::abort();
}));
let opt = Opt::from_args();
let res = match opt.cmd {
Command::Server => {
// Abort on panic (same behavior as in Go)
std::panic::set_hook(Box::new(|panic_info| {
error!("{}", panic_info.to_string());
std::process::abort();
}));
server::run_server(opt.config_file).await
Command::Server => server::run_server(opt.config_file).await,
Command::OfflineRepair(repair_opt) => {
repair::offline::offline_repair(opt.config_file, repair_opt).await
}
Command::Node(NodeOperation::NodeId(node_id_opt)) => {
node_id_command(opt.config_file, node_id_opt.quiet)

2
src/garage/repair/mod.rs Normal file
View File

@ -0,0 +1,2 @@
pub mod offline;
pub mod online;

View File

@ -0,0 +1,55 @@
use std::path::PathBuf;
use tokio::sync::watch;
use garage_util::background::*;
use garage_util::config::*;
use garage_util::error::*;
use garage_model::garage::Garage;
use crate::cli::structs::*;
pub async fn offline_repair(config_file: PathBuf, opt: OfflineRepairOpt) -> Result<(), Error> {
if !opt.yes {
return Err(Error::Message(
"Please add the --yes flag to launch repair operation".into(),
));
}
info!("Loading configuration...");
let config = read_config(config_file)?;
info!("Initializing background runner...");
let (done_tx, done_rx) = watch::channel(false);
let (background, await_background_done) = BackgroundRunner::new(16, done_rx);
info!("Initializing Garage main data store...");
let garage = Garage::new(config.clone(), background)?;
info!("Launching repair operation...");
match opt.what {
#[cfg(feature = "k2v")]
OfflineRepairWhat::K2VItemCounters => {
garage
.k2v
.counter_table
.offline_recount_all(&garage.k2v.item_table)?;
}
OfflineRepairWhat::ObjectCounters => {
garage
.object_counter_table
.offline_recount_all(&garage.object_table)?;
}
}
info!("Repair operation finished, shutting down Garage internals...");
done_tx.send(true).unwrap();
drop(garage);
await_background_done.await?;
info!("Cleaning up...");
Ok(())
}

View File

@ -11,11 +11,11 @@ use garage_util::error::Error;
use crate::*;
pub struct Repair {
pub struct OnlineRepair {
pub garage: Arc<Garage>,
}
impl Repair {
impl OnlineRepair {
pub async fn repair_worker(&self, opt: RepairOpt, must_exit: watch::Receiver<bool>) {
if let Err(e) = self.repair_worker_aux(opt, must_exit).await {
warn!("Repair worker failed with error: {}", e);
@ -64,13 +64,23 @@ impl Repair {
async fn repair_versions(&self, must_exit: &watch::Receiver<bool>) -> Result<(), Error> {
let mut pos = vec![];
let mut i = 0;
while let Some((item_key, item_bytes)) =
self.garage.version_table.data.store.get_gt(&pos)?
{
pos = item_key.to_vec();
while !*must_exit.borrow() {
let item_bytes = match self.garage.version_table.data.store.get_gt(pos)? {
Some((k, v)) => {
pos = k;
v
}
None => break,
};
let version = rmp_serde::decode::from_read_ref::<_, Version>(item_bytes.as_ref())?;
i += 1;
if i % 1000 == 0 {
info!("repair_versions: {}", i);
}
let version = rmp_serde::decode::from_read_ref::<_, Version>(&item_bytes)?;
if version.deleted.get() {
continue;
}
@ -98,23 +108,30 @@ impl Repair {
))
.await?;
}
if *must_exit.borrow() {
break;
}
}
info!("repair_versions: finished, done {}", i);
Ok(())
}
async fn repair_block_ref(&self, must_exit: &watch::Receiver<bool>) -> Result<(), Error> {
let mut pos = vec![];
let mut i = 0;
while let Some((item_key, item_bytes)) =
self.garage.block_ref_table.data.store.get_gt(&pos)?
{
pos = item_key.to_vec();
while !*must_exit.borrow() {
let item_bytes = match self.garage.block_ref_table.data.store.get_gt(pos)? {
Some((k, v)) => {
pos = k;
v
}
None => break,
};
let block_ref = rmp_serde::decode::from_read_ref::<_, BlockRef>(item_bytes.as_ref())?;
i += 1;
if i % 1000 == 0 {
info!("repair_block_ref: {}", i);
}
let block_ref = rmp_serde::decode::from_read_ref::<_, BlockRef>(&item_bytes)?;
if block_ref.deleted.get() {
continue;
}
@ -139,11 +156,8 @@ impl Repair {
})
.await?;
}
if *must_exit.borrow() {
break;
}
}
info!("repair_block_ref: finished, done {}", i);
Ok(())
}
}

View File

@ -27,24 +27,14 @@ async fn wait_from(mut chan: watch::Receiver<bool>) {
pub async fn run_server(config_file: PathBuf) -> Result<(), Error> {
info!("Loading configuration...");
let config = read_config(config_file).expect("Unable to read config file");
info!("Opening database...");
let mut db_path = config.metadata_dir.clone();
db_path.push("db");
let db = sled::Config::default()
.path(&db_path)
.cache_capacity(config.sled_cache_capacity)
.flush_every_ms(Some(config.sled_flush_every_ms))
.open()
.expect("Unable to open sled DB");
let config = read_config(config_file)?;
info!("Initializing background runner...");
let watch_cancel = netapp::util::watch_ctrl_c();
let (background, await_background_done) = BackgroundRunner::new(16, watch_cancel.clone());
info!("Initializing Garage main data store...");
let garage = Garage::new(config.clone(), db, background);
let garage = Garage::new(config.clone(), background)?;
info!("Initialize tracing...");
if let Some(export_to) = config.admin.trace_sink {
@ -54,6 +44,7 @@ pub async fn run_server(config_file: PathBuf) -> Result<(), Error> {
info!("Initialize Admin API server and metrics collector...");
let admin_server = AdminApiServer::new(garage.clone());
info!("Launching internal Garage cluster communications...");
let run_system = tokio::spawn(garage.system.clone().run(watch_cancel.clone()));
info!("Create admin RPC handler...");

View File

@ -29,8 +29,7 @@ async fn test_bucket_all() {
.unwrap()
.iter()
.filter(|x| x.name.as_ref().is_some())
.find(|x| x.name.as_ref().unwrap() == "hello")
.is_some());
.any(|x| x.name.as_ref().unwrap() == "hello"));
}
{
// Get its location
@ -75,13 +74,12 @@ async fn test_bucket_all() {
{
// Check bucket is deleted with List buckets
let r = ctx.client.list_buckets().send().await.unwrap();
assert!(r
assert!(!r
.buckets
.as_ref()
.unwrap()
.iter()
.filter(|x| x.name.as_ref().is_some())
.find(|x| x.name.as_ref().unwrap() == "hello")
.is_none());
.any(|x| x.name.as_ref().unwrap() == "hello"));
}
}

View File

@ -14,6 +14,7 @@ path = "lib.rs"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
garage_db = { version = "0.8.0", path = "../db" }
garage_rpc = { version = "0.7.0", path = "../rpc" }
garage_table = { version = "0.7.0", path = "../table" }
garage_block = { version = "0.7.0", path = "../block" }
@ -30,8 +31,6 @@ tracing = "0.1.30"
rand = "0.8"
zstd = { version = "0.9", default-features = false }
sled = "0.34"
rmp-serde = "0.15"
serde = { version = "1.0", default-features = false, features = ["derive", "rc"] }
serde_bytes = "0.11"

View File

@ -1,6 +1,6 @@
use serde::{Deserialize, Serialize};
use garage_table::crdt::Crdt;
use garage_table::crdt::*;
use garage_table::*;
use garage_util::data::*;
use garage_util::time::*;
@ -44,6 +44,9 @@ pub struct BucketParams {
pub website_config: crdt::Lww<Option<WebsiteConfig>>,
/// CORS rules
pub cors_config: crdt::Lww<Option<Vec<CorsRule>>>,
/// Bucket quotas
#[serde(default)]
pub quotas: crdt::Lww<BucketQuotas>,
}
#[derive(PartialEq, Eq, Clone, Debug, Serialize, Deserialize)]
@ -62,6 +65,18 @@ pub struct CorsRule {
pub expose_headers: Vec<String>,
}
#[derive(Default, PartialEq, Eq, PartialOrd, Ord, Clone, Debug, Serialize, Deserialize)]
pub struct BucketQuotas {
/// Maximum size in bytes (bucket size = sum of sizes of objects in the bucket)
pub max_size: Option<u64>,
/// Maximum number of non-deleted objects in the bucket
pub max_objects: Option<u64>,
}
impl AutoCrdt for BucketQuotas {
const WARN_IF_DIFFERENT: bool = true;
}
impl BucketParams {
/// Create an empty BucketParams with no authorized keys and no website accesss
pub fn new() -> Self {
@ -72,6 +87,7 @@ impl BucketParams {
local_aliases: crdt::LwwMap::new(),
website_config: crdt::Lww::new(None),
cors_config: crdt::Lww::new(None),
quotas: crdt::Lww::new(BucketQuotas::default()),
}
}
}
@ -86,6 +102,7 @@ impl Crdt for BucketParams {
self.website_config.merge(&o.website_config);
self.cors_config.merge(&o.cors_config);
self.quotas.merge(&o.quotas);
}
}

View File

@ -2,8 +2,11 @@ use std::sync::Arc;
use netapp::NetworkKey;
use garage_db as db;
use garage_util::background::*;
use garage_util::config::*;
use garage_util::error::Error;
use garage_rpc::system::System;
@ -20,12 +23,11 @@ use crate::s3::version_table::*;
use crate::bucket_alias_table::*;
use crate::bucket_table::*;
use crate::helper;
use crate::index_counter::*;
use crate::key_table::*;
#[cfg(feature = "k2v")]
use crate::index_counter::*;
#[cfg(feature = "k2v")]
use crate::k2v::{counter_table::*, item_table::*, poll::*, rpc::*};
use crate::k2v::{item_table::*, poll::*, rpc::*};
/// An entire Garage full of data
pub struct Garage {
@ -33,7 +35,7 @@ pub struct Garage {
pub config: Config,
/// The local database
pub db: sled::Db,
pub db: db::Db,
/// A background job runner
pub background: Arc<BackgroundRunner>,
/// The membership manager
@ -50,6 +52,8 @@ pub struct Garage {
/// Table containing S3 objects
pub object_table: Arc<Table<ObjectTable, TableShardedReplication>>,
/// Counting table containing object counters
pub object_counter_table: Arc<IndexCounter<Object>>,
/// Table containing S3 object versions
pub version_table: Arc<Table<VersionTable, TableShardedReplication>>,
/// Table containing S3 block references (not blocks themselves)
@ -64,14 +68,57 @@ pub struct GarageK2V {
/// Table containing K2V items
pub item_table: Arc<Table<K2VItemTable, TableShardedReplication>>,
/// Indexing table containing K2V item counters
pub counter_table: Arc<IndexCounter<K2VCounterTable>>,
pub counter_table: Arc<IndexCounter<K2VItem>>,
/// K2V RPC handler
pub rpc: Arc<K2VRpcHandler>,
}
impl Garage {
/// Create and run garage
pub fn new(config: Config, db: sled::Db, background: Arc<BackgroundRunner>) -> Arc<Self> {
pub fn new(config: Config, background: Arc<BackgroundRunner>) -> Result<Arc<Self>, Error> {
info!("Opening database...");
let mut db_path = config.metadata_dir.clone();
std::fs::create_dir_all(&db_path).expect("Unable to create Garage meta data directory");
let db = match config.db_engine.as_str() {
"sled" => {
db_path.push("db");
info!("Opening Sled database at: {}", db_path.display());
let db = db::sled_adapter::sled::Config::default()
.path(&db_path)
.cache_capacity(config.sled_cache_capacity)
.flush_every_ms(Some(config.sled_flush_every_ms))
.open()
.expect("Unable to open sled DB");
db::sled_adapter::SledDb::init(db)
}
"sqlite" | "sqlite3" | "rusqlite" => {
db_path.push("db.sqlite");
info!("Opening Sqlite database at: {}", db_path.display());
let db = db::sqlite_adapter::rusqlite::Connection::open(db_path)
.expect("Unable to open sqlite DB");
db::sqlite_adapter::SqliteDb::init(db)
}
"lmdb" | "heed" => {
db_path.push("db.lmdb");
info!("Opening LMDB database at: {}", db_path.display());
std::fs::create_dir_all(&db_path).expect("Unable to create LMDB data directory");
let map_size = garage_db::lmdb_adapter::recommended_map_size();
let db = db::lmdb_adapter::heed::EnvOpenOptions::new()
.max_dbs(100)
.map_size(map_size)
.open(&db_path)
.expect("Unable to open LMDB DB");
db::lmdb_adapter::LmdbDb::init(db)
}
e => {
return Err(Error::Message(format!(
"Unsupported DB engine: {} (options: sled, sqlite, lmdb)",
e
)));
}
};
let network_key = NetworkKey::from_slice(
&hex::decode(&config.rpc_secret).expect("Invalid RPC secret key")[..],
)
@ -153,12 +200,16 @@ impl Garage {
&db,
);
info!("Initialize object counter table...");
let object_counter_table = IndexCounter::new(system.clone(), meta_rep_param.clone(), &db);
info!("Initialize object_table...");
#[allow(clippy::redundant_clone)]
let object_table = Table::new(
ObjectTable {
background: background.clone(),
version_table: version_table.clone(),
object_counter_table: object_counter_table.clone(),
},
meta_rep_param.clone(),
system.clone(),
@ -169,9 +220,8 @@ impl Garage {
#[cfg(feature = "k2v")]
let k2v = GarageK2V::new(system.clone(), &db, meta_rep_param);
info!("Initialize Garage...");
Arc::new(Self {
// -- done --
Ok(Arc::new(Self {
config,
db,
background,
@ -181,11 +231,12 @@ impl Garage {
bucket_alias_table,
key_table,
object_table,
object_counter_table,
version_table,
block_ref_table,
#[cfg(feature = "k2v")]
k2v,
})
}))
}
pub fn bucket_helper(&self) -> helper::bucket::BucketHelper {
@ -199,7 +250,7 @@ impl Garage {
#[cfg(feature = "k2v")]
impl GarageK2V {
fn new(system: Arc<System>, db: &sled::Db, meta_rep_param: TableShardedReplication) -> Self {
fn new(system: Arc<System>, db: &db::Db, meta_rep_param: TableShardedReplication) -> Self {
info!("Initialize K2V counter table...");
let counter_table = IndexCounter::new(system.clone(), meta_rep_param.clone(), db);
info!("Initialize K2V subscription manager...");

View File

@ -1,3 +1,4 @@
use core::ops::Bound;
use std::collections::{hash_map, BTreeMap, HashMap};
use std::marker::PhantomData;
use std::sync::Arc;
@ -6,34 +7,42 @@ use std::time::Duration;
use serde::{Deserialize, Serialize};
use tokio::sync::{mpsc, watch};
use garage_db as db;
use garage_rpc::ring::Ring;
use garage_rpc::system::System;
use garage_util::data::*;
use garage_util::error::*;
use garage_util::time::*;
use garage_table::crdt::*;
use garage_table::replication::TableShardedReplication;
use garage_table::replication::*;
use garage_table::*;
pub trait CounterSchema: Clone + PartialEq + Send + Sync + 'static {
const NAME: &'static str;
type P: PartitionKey + Clone + PartialEq + Serialize + for<'de> Deserialize<'de> + Send + Sync;
type S: SortKey + Clone + PartialEq + Serialize + for<'de> Deserialize<'de> + Send + Sync;
pub trait CountedItem: Clone + PartialEq + Send + Sync + 'static {
const COUNTER_TABLE_NAME: &'static str;
type CP: PartitionKey + Clone + PartialEq + Serialize + for<'de> Deserialize<'de> + Send + Sync;
type CS: SortKey + Clone + PartialEq + Serialize + for<'de> Deserialize<'de> + Send + Sync;
fn counter_partition_key(&self) -> &Self::CP;
fn counter_sort_key(&self) -> &Self::CS;
fn counts(&self) -> Vec<(&'static str, i64)>;
}
/// A counter entry in the global table
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct CounterEntry<T: CounterSchema> {
pub pk: T::P,
pub sk: T::S,
#[derive(Clone, PartialEq, Debug, Serialize, Deserialize)]
pub struct CounterEntry<T: CountedItem> {
pub pk: T::CP,
pub sk: T::CS,
pub values: BTreeMap<String, CounterValue>,
}
impl<T: CounterSchema> Entry<T::P, T::S> for CounterEntry<T> {
fn partition_key(&self) -> &T::P {
impl<T: CountedItem> Entry<T::CP, T::CS> for CounterEntry<T> {
fn partition_key(&self) -> &T::CP {
&self.pk
}
fn sort_key(&self) -> &T::S {
fn sort_key(&self) -> &T::CS {
&self.sk
}
fn is_tombstone(&self) -> bool {
@ -43,7 +52,7 @@ impl<T: CounterSchema> Entry<T::P, T::S> for CounterEntry<T> {
}
}
impl<T: CounterSchema> CounterEntry<T> {
impl<T: CountedItem> CounterEntry<T> {
pub fn filtered_values(&self, ring: &Ring) -> HashMap<String, i64> {
let nodes = &ring.layout.node_id_vec[..];
self.filtered_values_with_nodes(nodes)
@ -76,7 +85,7 @@ pub struct CounterValue {
pub node_values: BTreeMap<Uuid, (u64, i64)>,
}
impl<T: CounterSchema> Crdt for CounterEntry<T> {
impl<T: CountedItem> Crdt for CounterEntry<T> {
fn merge(&mut self, other: &Self) {
for (name, e2) in other.values.iter() {
if let Some(e) = self.values.get_mut(name) {
@ -102,22 +111,18 @@ impl Crdt for CounterValue {
}
}
pub struct CounterTable<T: CounterSchema> {
pub struct CounterTable<T: CountedItem> {
_phantom_t: PhantomData<T>,
}
impl<T: CounterSchema> TableSchema for CounterTable<T> {
const TABLE_NAME: &'static str = T::NAME;
impl<T: CountedItem> TableSchema for CounterTable<T> {
const TABLE_NAME: &'static str = T::COUNTER_TABLE_NAME;
type P = T::P;
type S = T::S;
type P = T::CP;
type S = T::CS;
type E = CounterEntry<T>;
type Filter = (DeletedFilter, Vec<Uuid>);
fn updated(&self, _old: Option<&Self::E>, _new: Option<&Self::E>) {
// nothing for now
}
fn matches_filter(entry: &Self::E, filter: &Self::Filter) -> bool {
if filter.0 == DeletedFilter::Any {
return true;
@ -133,18 +138,18 @@ impl<T: CounterSchema> TableSchema for CounterTable<T> {
// ----
pub struct IndexCounter<T: CounterSchema> {
pub struct IndexCounter<T: CountedItem> {
this_node: Uuid,
local_counter: sled::Tree,
propagate_tx: mpsc::UnboundedSender<(T::P, T::S, LocalCounterEntry)>,
local_counter: db::Tree,
propagate_tx: mpsc::UnboundedSender<(T::CP, T::CS, LocalCounterEntry<T>)>,
pub table: Arc<Table<CounterTable<T>, TableShardedReplication>>,
}
impl<T: CounterSchema> IndexCounter<T> {
impl<T: CountedItem> IndexCounter<T> {
pub fn new(
system: Arc<System>,
replication: TableShardedReplication,
db: &sled::Db,
db: &db::Db,
) -> Arc<Self> {
let background = system.background.clone();
@ -153,7 +158,7 @@ impl<T: CounterSchema> IndexCounter<T> {
let this = Arc::new(Self {
this_node: system.id,
local_counter: db
.open_tree(format!("local_counter:{}", T::NAME))
.open_tree(format!("local_counter_v2:{}", T::COUNTER_TABLE_NAME))
.expect("Unable to open local counter tree"),
propagate_tx,
table: Table::new(
@ -168,42 +173,63 @@ impl<T: CounterSchema> IndexCounter<T> {
let this2 = this.clone();
background.spawn_worker(
format!("{} index counter propagator", T::NAME),
format!("{} index counter propagator", T::COUNTER_TABLE_NAME),
move |must_exit| this2.clone().propagate_loop(propagate_rx, must_exit),
);
this
}
pub fn count(&self, pk: &T::P, sk: &T::S, counts: &[(&str, i64)]) -> Result<(), Error> {
pub fn count(
&self,
tx: &mut db::Transaction,
old: Option<&T>,
new: Option<&T>,
) -> db::TxResult<(), Error> {
let pk = old
.map(|e| e.counter_partition_key())
.unwrap_or_else(|| new.unwrap().counter_partition_key());
let sk = old
.map(|e| e.counter_sort_key())
.unwrap_or_else(|| new.unwrap().counter_sort_key());
// calculate counter differences
let mut counts = HashMap::new();
for (k, v) in old.map(|x| x.counts()).unwrap_or_default() {
*counts.entry(k).or_insert(0) -= v;
}
for (k, v) in new.map(|x| x.counts()).unwrap_or_default() {
*counts.entry(k).or_insert(0) += v;
}
// update local counter table
let tree_key = self.table.data.tree_key(pk, sk);
let new_entry = self.local_counter.transaction(|tx| {
let mut entry = match tx.get(&tree_key[..])? {
Some(old_bytes) => {
rmp_serde::decode::from_read_ref::<_, LocalCounterEntry>(&old_bytes)
.map_err(Error::RmpDecode)
.map_err(sled::transaction::ConflictableTransactionError::Abort)?
}
None => LocalCounterEntry {
values: BTreeMap::new(),
},
};
for (s, inc) in counts.iter() {
let mut ent = entry.values.entry(s.to_string()).or_insert((0, 0));
ent.0 += 1;
ent.1 += *inc;
let mut entry = match tx.get(&self.local_counter, &tree_key[..])? {
Some(old_bytes) => {
rmp_serde::decode::from_read_ref::<_, LocalCounterEntry<T>>(&old_bytes)
.map_err(Error::RmpDecode)
.map_err(db::TxError::Abort)?
}
None => LocalCounterEntry {
pk: pk.clone(),
sk: sk.clone(),
values: BTreeMap::new(),
},
};
let new_entry_bytes = rmp_to_vec_all_named(&entry)
.map_err(Error::RmpEncode)
.map_err(sled::transaction::ConflictableTransactionError::Abort)?;
tx.insert(&tree_key[..], new_entry_bytes)?;
let now = now_msec();
for (s, inc) in counts.iter() {
let mut ent = entry.values.entry(s.to_string()).or_insert((0, 0));
ent.0 = std::cmp::max(ent.0 + 1, now);
ent.1 += *inc;
}
Ok(entry)
})?;
let new_entry_bytes = rmp_to_vec_all_named(&entry)
.map_err(Error::RmpEncode)
.map_err(db::TxError::Abort)?;
tx.insert(&self.local_counter, &tree_key[..], new_entry_bytes)?;
if let Err(e) = self.propagate_tx.send((pk.clone(), sk.clone(), new_entry)) {
if let Err(e) = self.propagate_tx.send((pk.clone(), sk.clone(), entry)) {
error!(
"Could not propagate updated counter values, failed to send to channel: {}",
e
@ -215,7 +241,7 @@ impl<T: CounterSchema> IndexCounter<T> {
async fn propagate_loop(
self: Arc<Self>,
mut propagate_rx: mpsc::UnboundedReceiver<(T::P, T::S, LocalCounterEntry)>,
mut propagate_rx: mpsc::UnboundedReceiver<(T::CP, T::CS, LocalCounterEntry<T>)>,
must_exit: watch::Receiver<bool>,
) {
// This loop batches updates to counters to be sent all at once.
@ -238,7 +264,7 @@ impl<T: CounterSchema> IndexCounter<T> {
if let Some((pk, sk, counters)) = ent {
let tree_key = self.table.data.tree_key(&pk, &sk);
let dist_entry = counters.into_counter_entry::<T>(self.this_node, pk, sk);
let dist_entry = counters.into_counter_entry(self.this_node);
match buf.entry(tree_key) {
hash_map::Entry::Vacant(e) => {
e.insert(dist_entry);
@ -257,10 +283,10 @@ impl<T: CounterSchema> IndexCounter<T> {
if let Err(e) = self.table.insert_many(entries).await {
errors += 1;
if errors >= 2 && *must_exit.borrow() {
error!("({}) Could not propagate {} counter values: {}, these counters will not be updated correctly.", T::NAME, buf.len(), e);
error!("({}) Could not propagate {} counter values: {}, these counters will not be updated correctly.", T::COUNTER_TABLE_NAME, buf.len(), e);
break;
}
warn!("({}) Could not propagate {} counter values: {}, retrying in 5 seconds (retry #{})", T::NAME, buf.len(), e, errors);
warn!("({}) Could not propagate {} counter values: {}, retrying in 5 seconds (retry #{})", T::COUNTER_TABLE_NAME, buf.len(), e, errors);
tokio::time::sleep(Duration::from_secs(5)).await;
continue;
}
@ -274,23 +300,155 @@ impl<T: CounterSchema> IndexCounter<T> {
}
}
}
pub fn offline_recount_all<TS, TR>(
&self,
counted_table: &Arc<Table<TS, TR>>,
) -> Result<(), Error>
where
TS: TableSchema<E = T>,
TR: TableReplication,
{
let save_counter_entry = |entry: CounterEntry<T>| -> Result<(), Error> {
let entry_k = self
.table
.data
.tree_key(entry.partition_key(), entry.sort_key());
self.table
.data
.update_entry_with(&entry_k, |ent| match ent {
Some(mut ent) => {
ent.merge(&entry);
ent
}
None => entry.clone(),
})?;
Ok(())
};
// 1. Set all old local counters to zero
let now = now_msec();
let mut next_start: Option<Vec<u8>> = None;
loop {
let low_bound = match next_start.take() {
Some(v) => Bound::Excluded(v),
None => Bound::Unbounded,
};
let mut batch = vec![];
for item in self.local_counter.range((low_bound, Bound::Unbounded))? {
batch.push(item?);
if batch.len() > 1000 {
break;
}
}
if batch.is_empty() {
break;
}
info!("zeroing old counters... ({})", hex::encode(&batch[0].0));
for (local_counter_k, local_counter) in batch {
let mut local_counter =
rmp_serde::decode::from_read_ref::<_, LocalCounterEntry<T>>(&local_counter)?;
for (_, tv) in local_counter.values.iter_mut() {
tv.0 = std::cmp::max(tv.0 + 1, now);
tv.1 = 0;
}
let local_counter_bytes = rmp_to_vec_all_named(&local_counter)?;
self.local_counter
.insert(&local_counter_k, &local_counter_bytes)?;
let counter_entry = local_counter.into_counter_entry(self.this_node);
save_counter_entry(counter_entry)?;
next_start = Some(local_counter_k);
}
}
// 2. Recount all table entries
let now = now_msec();
let mut next_start: Option<Vec<u8>> = None;
loop {
let low_bound = match next_start.take() {
Some(v) => Bound::Excluded(v),
None => Bound::Unbounded,
};
let mut batch = vec![];
for item in counted_table
.data
.store
.range((low_bound, Bound::Unbounded))?
{
batch.push(item?);
if batch.len() > 1000 {
break;
}
}
if batch.is_empty() {
break;
}
info!("counting entries... ({})", hex::encode(&batch[0].0));
for (counted_entry_k, counted_entry) in batch {
let counted_entry = counted_table.data.decode_entry(&counted_entry)?;
let pk = counted_entry.counter_partition_key();
let sk = counted_entry.counter_sort_key();
let counts = counted_entry.counts();
let local_counter_key = self.table.data.tree_key(pk, sk);
let mut local_counter = match self.local_counter.get(&local_counter_key)? {
Some(old_bytes) => {
let ent = rmp_serde::decode::from_read_ref::<_, LocalCounterEntry<T>>(
&old_bytes,
)?;
assert!(ent.pk == *pk);
assert!(ent.sk == *sk);
ent
}
None => LocalCounterEntry {
pk: pk.clone(),
sk: sk.clone(),
values: BTreeMap::new(),
},
};
for (s, v) in counts.iter() {
let mut tv = local_counter.values.entry(s.to_string()).or_insert((0, 0));
tv.0 = std::cmp::max(tv.0 + 1, now);
tv.1 += v;
}
let local_counter_bytes = rmp_to_vec_all_named(&local_counter)?;
self.local_counter
.insert(&local_counter_key, local_counter_bytes)?;
let counter_entry = local_counter.into_counter_entry(self.this_node);
save_counter_entry(counter_entry)?;
next_start = Some(counted_entry_k);
}
}
// Done
Ok(())
}
}
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
struct LocalCounterEntry {
struct LocalCounterEntry<T: CountedItem> {
pk: T::CP,
sk: T::CS,
values: BTreeMap<String, (u64, i64)>,
}
impl LocalCounterEntry {
fn into_counter_entry<T: CounterSchema>(
self,
this_node: Uuid,
pk: T::P,
sk: T::S,
) -> CounterEntry<T> {
impl<T: CountedItem> LocalCounterEntry<T> {
fn into_counter_entry(self, this_node: Uuid) -> CounterEntry<T> {
CounterEntry {
pk,
sk,
pk: self.pk,
sk: self.sk,
values: self
.values
.into_iter()

View File

@ -1,20 +0,0 @@
use garage_util::data::*;
use crate::index_counter::*;
pub const ENTRIES: &str = "entries";
pub const CONFLICTS: &str = "conflicts";
pub const VALUES: &str = "values";
pub const BYTES: &str = "bytes";
#[derive(PartialEq, Clone)]
pub struct K2VCounterTable;
impl CounterSchema for K2VCounterTable {
const NAME: &'static str = "k2v_index_counter";
// Partition key = bucket id
type P = Uuid;
// Sort key = K2V item's partition key
type S = String;
}

View File

@ -2,6 +2,7 @@ use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
use std::sync::Arc;
use garage_db as db;
use garage_util::data::*;
use garage_table::crdt::*;
@ -9,9 +10,13 @@ use garage_table::*;
use crate::index_counter::*;
use crate::k2v::causality::*;
use crate::k2v::counter_table::*;
use crate::k2v::poll::*;
pub const ENTRIES: &str = "entries";
pub const CONFLICTS: &str = "conflicts";
pub const VALUES: &str = "values";
pub const BYTES: &str = "bytes";
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct K2VItem {
pub partition: K2VItemPartition,
@ -111,27 +116,6 @@ impl K2VItem {
ent.discard();
}
}
// returns counters: (non-deleted entries, conflict entries, non-tombstone values, bytes used)
fn stats(&self) -> (i64, i64, i64, i64) {
let values = self.values();
let n_entries = if self.is_tombstone() { 0 } else { 1 };
let n_conflicts = if values.len() > 1 { 1 } else { 0 };
let n_values = values
.iter()
.filter(|v| matches!(v, DvvsValue::Value(_)))
.count() as i64;
let n_bytes = values
.iter()
.map(|v| match v {
DvvsValue::Deleted => 0,
DvvsValue::Value(v) => v.len() as i64,
})
.sum();
(n_entries, n_conflicts, n_values, n_bytes)
}
}
impl DvvsEntry {
@ -203,7 +187,7 @@ impl Entry<K2VItemPartition, String> for K2VItem {
}
pub struct K2VItemTable {
pub(crate) counter_table: Arc<IndexCounter<K2VCounterTable>>,
pub(crate) counter_table: Arc<IndexCounter<K2VItem>>,
pub(crate) subscriptions: Arc<SubscriptionManager>,
}
@ -221,41 +205,30 @@ impl TableSchema for K2VItemTable {
type E = K2VItem;
type Filter = ItemFilter;
fn updated(&self, old: Option<&Self::E>, new: Option<&Self::E>) {
fn updated(
&self,
tx: &mut db::Transaction,
old: Option<&Self::E>,
new: Option<&Self::E>,
) -> db::TxOpResult<()> {
// 1. Count
let (old_entries, old_conflicts, old_values, old_bytes) = match old {
None => (0, 0, 0, 0),
Some(e) => e.stats(),
};
let (new_entries, new_conflicts, new_values, new_bytes) = match new {
None => (0, 0, 0, 0),
Some(e) => e.stats(),
};
let count_pk = old
.map(|e| e.partition.bucket_id)
.unwrap_or_else(|| new.unwrap().partition.bucket_id);
let count_sk = old
.map(|e| &e.partition.partition_key)
.unwrap_or_else(|| &new.unwrap().partition.partition_key);
if let Err(e) = self.counter_table.count(
&count_pk,
count_sk,
&[
(ENTRIES, new_entries - old_entries),
(CONFLICTS, new_conflicts - old_conflicts),
(VALUES, new_values - old_values),
(BYTES, new_bytes - old_bytes),
],
) {
error!("Could not update K2V counter for bucket {:?} partition {}; counts will now be inconsistent. {}", count_pk, count_sk, e);
let counter_res = self.counter_table.count(tx, old, new);
if let Err(e) = db::unabort(counter_res)? {
// This result can be returned by `counter_table.count()` for instance
// if messagepack serialization or deserialization fails at some step.
// Warn admin but ignore this error for now, that's all we can do.
error!(
"Unable to update K2V item counter: {}. Index values will be wrong!",
e
);
}
// 2. Notify
if let Some(new_ent) = new {
self.subscriptions.notify(new_ent);
}
Ok(())
}
#[allow(clippy::nonminimal_bool)]
@ -266,6 +239,47 @@ impl TableSchema for K2VItemTable {
}
}
impl CountedItem for K2VItem {
const COUNTER_TABLE_NAME: &'static str = "k2v_index_counter_v2";
// Partition key = bucket id
type CP = Uuid;
// Sort key = K2V item's partition key
type CS = String;
fn counter_partition_key(&self) -> &Uuid {
&self.partition.bucket_id
}
fn counter_sort_key(&self) -> &String {
&self.partition.partition_key
}
fn counts(&self) -> Vec<(&'static str, i64)> {
let values = self.values();
let n_entries = if self.is_tombstone() { 0 } else { 1 };
let n_conflicts = if values.len() > 1 { 1 } else { 0 };
let n_values = values
.iter()
.filter(|v| matches!(v, DvvsValue::Value(_)))
.count() as i64;
let n_bytes = values
.iter()
.map(|v| match v {
DvvsValue::Deleted => 0,
DvvsValue::Value(v) => v.len() as i64,
})
.sum();
vec![
(ENTRIES, n_entries),
(CONFLICTS, n_conflicts),
(VALUES, n_values),
(BYTES, n_bytes),
]
}
}
#[cfg(test)]
mod tests {
use super::*;

View File

@ -1,6 +1,5 @@
pub mod causality;
pub mod counter_table;
pub mod item_table;
pub mod poll;

View File

@ -25,11 +25,15 @@ impl Migrate {
.open_tree("bucket:table")
.map_err(GarageError::from)?;
for res in tree.iter() {
let mut old_buckets = vec![];
for res in tree.iter().map_err(GarageError::from)? {
let (_k, v) = res.map_err(GarageError::from)?;
let bucket = rmp_serde::decode::from_read_ref::<_, old_bucket::Bucket>(&v[..])
.map_err(GarageError::from)?;
old_buckets.push(bucket);
}
for bucket in old_buckets {
if let old_bucket::BucketState::Present(p) = bucket.state.get() {
self.migrate_buckets050_do_bucket(&bucket, p).await?;
}
@ -73,6 +77,7 @@ impl Migrate {
local_aliases: LwwMap::new(),
website_config: Lww::new(website),
cors_config: Lww::new(None),
quotas: Lww::new(Default::default()),
}),
})
.await?;

View File

@ -1,6 +1,8 @@
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use garage_db as db;
use garage_util::data::*;
use garage_table::crdt::Crdt;
@ -51,21 +53,22 @@ impl TableSchema for BlockRefTable {
type E = BlockRef;
type Filter = DeletedFilter;
fn updated(&self, old: Option<&Self::E>, new: Option<&Self::E>) {
#[allow(clippy::or_fun_call)]
let block = &old.or(new).unwrap().block;
fn updated(
&self,
tx: &mut db::Transaction,
old: Option<&Self::E>,
new: Option<&Self::E>,
) -> db::TxOpResult<()> {
let block = old.or(new).unwrap().block;
let was_before = old.map(|x| !x.deleted.get()).unwrap_or(false);
let is_after = new.map(|x| !x.deleted.get()).unwrap_or(false);
if is_after && !was_before {
if let Err(e) = self.block_manager.block_incref(block) {
warn!("block_incref failed for block {:?}: {}", block, e);
}
self.block_manager.block_incref(tx, block)?;
}
if was_before && !is_after {
if let Err(e) = self.block_manager.block_decref(block) {
warn!("block_decref failed for block {:?}: {}", block, e);
}
self.block_manager.block_decref(tx, block)?;
}
Ok(())
}
fn matches_filter(entry: &Self::E, filter: &Self::Filter) -> bool {

View File

@ -2,6 +2,8 @@ use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
use std::sync::Arc;
use garage_db as db;
use garage_util::background::BackgroundRunner;
use garage_util::data::*;
@ -9,10 +11,15 @@ use garage_table::crdt::*;
use garage_table::replication::TableShardedReplication;
use garage_table::*;
use crate::index_counter::*;
use crate::s3::version_table::*;
use garage_model_050::object_table as old;
pub const OBJECTS: &str = "objects";
pub const UNFINISHED_UPLOADS: &str = "unfinished_uploads";
pub const BYTES: &str = "bytes";
/// An object
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct Object {
@ -216,6 +223,7 @@ impl Crdt for Object {
pub struct ObjectTable {
pub background: Arc<BackgroundRunner>,
pub version_table: Arc<Table<VersionTable, TableShardedReplication>>,
pub object_counter_table: Arc<IndexCounter<Object>>,
}
#[derive(Clone, Copy, Debug, Serialize, Deserialize)]
@ -232,7 +240,22 @@ impl TableSchema for ObjectTable {
type E = Object;
type Filter = ObjectFilter;
fn updated(&self, old: Option<&Self::E>, new: Option<&Self::E>) {
fn updated(
&self,
tx: &mut db::Transaction,
old: Option<&Self::E>,
new: Option<&Self::E>,
) -> db::TxOpResult<()> {
// 1. Count
let counter_res = self.object_counter_table.count(tx, old, new);
if let Err(e) = db::unabort(counter_res)? {
error!(
"Unable to update object counter: {}. Index values will be wrong!",
e
);
}
// 2. Spawn threads that propagates deletions to version table
let version_table = self.version_table.clone();
let old = old.cloned();
let new = new.cloned();
@ -259,7 +282,8 @@ impl TableSchema for ObjectTable {
}
}
Ok(())
})
});
Ok(())
}
fn matches_filter(entry: &Self::E, filter: &Self::Filter) -> bool {
@ -275,6 +299,49 @@ impl TableSchema for ObjectTable {
}
}
impl CountedItem for Object {
const COUNTER_TABLE_NAME: &'static str = "bucket_object_counter";
// Partition key = bucket id
type CP = Uuid;
// Sort key = nothing
type CS = EmptyKey;
fn counter_partition_key(&self) -> &Uuid {
&self.bucket_id
}
fn counter_sort_key(&self) -> &EmptyKey {
&EmptyKey
}
fn counts(&self) -> Vec<(&'static str, i64)> {
let versions = self.versions();
let n_objects = if versions.iter().any(|v| v.is_data()) {
1
} else {
0
};
let n_unfinished_uploads = versions
.iter()
.filter(|v| matches!(v.state, ObjectVersionState::Uploading(_)))
.count();
let n_bytes = versions
.iter()
.map(|v| match &v.state {
ObjectVersionState::Complete(ObjectVersionData::Inline(meta, _))
| ObjectVersionState::Complete(ObjectVersionData::FirstBlock(meta, _)) => meta.size,
_ => 0,
})
.sum::<u64>();
vec![
(OBJECTS, n_objects),
(UNFINISHED_UPLOADS, n_unfinished_uploads as i64),
(BYTES, n_bytes as i64),
]
}
}
// vvvvvvvv migration code, stupid stuff vvvvvvvvvvvv
// (we just want to change bucket into bucket_id by hashing it)

View File

@ -1,6 +1,8 @@
use serde::{Deserialize, Serialize};
use std::sync::Arc;
use garage_db as db;
use garage_util::background::BackgroundRunner;
use garage_util::data::*;
@ -137,7 +139,12 @@ impl TableSchema for VersionTable {
type E = Version;
type Filter = DeletedFilter;
fn updated(&self, old: Option<&Self::E>, new: Option<&Self::E>) {
fn updated(
&self,
_tx: &mut db::Transaction,
old: Option<&Self::E>,
new: Option<&Self::E>,
) -> db::TxOpResult<()> {
let block_ref_table = self.block_ref_table.clone();
let old = old.cloned();
let new = new.cloned();
@ -160,7 +167,9 @@ impl TableSchema for VersionTable {
}
}
Ok(())
})
});
Ok(())
}
fn matches_filter(entry: &Self::E, filter: &Self::Filter) -> bool {

View File

@ -14,6 +14,7 @@ path = "lib.rs"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
garage_db = { version = "0.8.0", path = "../db" }
garage_rpc = { version = "0.7.0", path = "../rpc" }
garage_util = { version = "0.7.0", path = "../util" }
@ -25,8 +26,6 @@ hexdump = "0.1"
tracing = "0.1.30"
rand = "0.8"
sled = "0.34"
rmp-serde = "0.15"
serde = { version = "1.0", default-features = false, features = ["derive", "rc"] }
serde_bytes = "0.11"

View File

@ -3,12 +3,13 @@ use std::convert::TryInto;
use std::sync::Arc;
use serde_bytes::ByteBuf;
use sled::{IVec, Transactional};
use tokio::sync::Notify;
use garage_db as db;
use garage_db::counted_tree_hack::CountedTree;
use garage_util::data::*;
use garage_util::error::*;
use garage_util::sled_counter::SledCountedTree;
use garage_rpc::system::System;
@ -25,12 +26,12 @@ pub struct TableData<F: TableSchema, R: TableReplication> {
pub instance: F,
pub replication: R,
pub store: sled::Tree,
pub store: db::Tree,
pub(crate) merkle_tree: sled::Tree,
pub(crate) merkle_todo: sled::Tree,
pub(crate) merkle_tree: db::Tree,
pub(crate) merkle_todo: db::Tree,
pub(crate) merkle_todo_notify: Notify,
pub(crate) gc_todo: SledCountedTree,
pub(crate) gc_todo: CountedTree,
pub(crate) metrics: TableMetrics,
}
@ -40,7 +41,7 @@ where
F: TableSchema,
R: TableReplication,
{
pub fn new(system: Arc<System>, instance: F, replication: R, db: &sled::Db) -> Arc<Self> {
pub fn new(system: Arc<System>, instance: F, replication: R, db: &db::Db) -> Arc<Self> {
let store = db
.open_tree(&format!("{}:table", F::TABLE_NAME))
.expect("Unable to open DB tree");
@ -55,7 +56,7 @@ where
let gc_todo = db
.open_tree(&format!("{}:gc_todo_v2", F::TABLE_NAME))
.expect("Unable to open DB tree");
let gc_todo = SledCountedTree::new(gc_todo);
let gc_todo = CountedTree::new(gc_todo).expect("Cannot count gc_todo_v2");
let metrics = TableMetrics::new(F::TABLE_NAME, merkle_todo.clone(), gc_todo.clone());
@ -98,30 +99,30 @@ where
None => partition_hash.to_vec(),
Some(sk) => self.tree_key(partition_key, sk),
};
let range = self.store.range(first_key..);
let range = self.store.range(first_key..)?;
self.read_range_aux(partition_hash, range, filter, limit)
}
EnumerationOrder::Reverse => match start {
Some(sk) => {
let last_key = self.tree_key(partition_key, sk);
let range = self.store.range(..=last_key).rev();
let range = self.store.range_rev(..=last_key)?;
self.read_range_aux(partition_hash, range, filter, limit)
}
None => {
let mut last_key = partition_hash.to_vec();
let lower = u128::from_be_bytes(last_key[16..32].try_into().unwrap());
last_key[16..32].copy_from_slice(&u128::to_be_bytes(lower + 1));
let range = self.store.range(..last_key).rev();
let range = self.store.range_rev(..last_key)?;
self.read_range_aux(partition_hash, range, filter, limit)
}
},
}
}
fn read_range_aux(
fn read_range_aux<'a>(
&self,
partition_hash: Hash,
range: impl Iterator<Item = sled::Result<(IVec, IVec)>>,
range: db::ValueIter<'a>,
filter: &Option<F::Filter>,
limit: usize,
) -> Result<Vec<Arc<ByteBuf>>, Error> {
@ -139,7 +140,7 @@ where
}
};
if keep {
ret.push(Arc::new(ByteBuf::from(value.as_ref())));
ret.push(Arc::new(ByteBuf::from(value)));
}
if ret.len() >= limit {
break;
@ -183,12 +184,10 @@ where
tree_key: &[u8],
f: impl Fn(Option<F::E>) -> F::E,
) -> Result<Option<F::E>, Error> {
let changed = (&self.store, &self.merkle_todo).transaction(|(store, mkl_todo)| {
let (old_entry, old_bytes, new_entry) = match store.get(tree_key)? {
let changed = self.store.db().transaction(|mut tx| {
let (old_entry, old_bytes, new_entry) = match tx.get(&self.store, tree_key)? {
Some(old_bytes) => {
let old_entry = self
.decode_entry(&old_bytes)
.map_err(sled::transaction::ConflictableTransactionError::Abort)?;
let old_entry = self.decode_entry(&old_bytes).map_err(db::TxError::Abort)?;
let new_entry = f(Some(old_entry.clone()));
(Some(old_entry), Some(old_bytes), new_entry)
}
@ -204,24 +203,28 @@ where
// the associated Merkle tree entry.
let new_bytes = rmp_to_vec_all_named(&new_entry)
.map_err(Error::RmpEncode)
.map_err(sled::transaction::ConflictableTransactionError::Abort)?;
.map_err(db::TxError::Abort)?;
let encoding_changed = Some(&new_bytes[..]) != old_bytes.as_ref().map(|x| &x[..]);
drop(old_bytes);
if value_changed || encoding_changed {
let new_bytes_hash = blake2sum(&new_bytes[..]);
mkl_todo.insert(tree_key.to_vec(), new_bytes_hash.as_slice())?;
store.insert(tree_key.to_vec(), new_bytes)?;
Ok(Some((old_entry, new_entry, new_bytes_hash)))
tx.insert(&self.merkle_todo, tree_key, new_bytes_hash.as_slice())?;
tx.insert(&self.store, tree_key, new_bytes)?;
self.instance
.updated(&mut tx, old_entry.as_ref(), Some(&new_entry))?;
Ok(Some((new_entry, new_bytes_hash)))
} else {
Ok(None)
}
})?;
if let Some((old_entry, new_entry, new_bytes_hash)) = changed {
if let Some((new_entry, new_bytes_hash)) = changed {
self.metrics.internal_update_counter.add(1);
let is_tombstone = new_entry.is_tombstone();
self.instance.updated(old_entry.as_ref(), Some(&new_entry));
self.merkle_todo_notify.notify_one();
if is_tombstone {
// We are only responsible for GC'ing this item if we are the
@ -244,22 +247,23 @@ where
}
pub(crate) fn delete_if_equal(self: &Arc<Self>, k: &[u8], v: &[u8]) -> Result<bool, Error> {
let removed = (&self.store, &self.merkle_todo).transaction(|(store, mkl_todo)| {
if let Some(cur_v) = store.get(k)? {
if cur_v == v {
store.remove(k)?;
mkl_todo.insert(k, vec![])?;
return Ok(true);
let removed = self
.store
.db()
.transaction(|mut tx| match tx.get(&self.store, k)? {
Some(cur_v) if cur_v == v => {
tx.remove(&self.store, k)?;
tx.insert(&self.merkle_todo, k, vec![])?;
let old_entry = self.decode_entry(v).map_err(db::TxError::Abort)?;
self.instance.updated(&mut tx, Some(&old_entry), None)?;
Ok(true)
}
}
Ok(false)
})?;
_ => Ok(false),
})?;
if removed {
self.metrics.internal_delete_counter.add(1);
let old_entry = self.decode_entry(v)?;
self.instance.updated(Some(&old_entry), None);
self.merkle_todo_notify.notify_one();
}
Ok(removed)
@ -270,25 +274,26 @@ where
k: &[u8],
vhash: Hash,
) -> Result<bool, Error> {
let removed = (&self.store, &self.merkle_todo).transaction(|(store, mkl_todo)| {
if let Some(cur_v) = store.get(k)? {
if blake2sum(&cur_v[..]) == vhash {
store.remove(k)?;
mkl_todo.insert(k, vec![])?;
return Ok(Some(cur_v));
}
}
Ok(None)
})?;
let removed = self
.store
.db()
.transaction(|mut tx| match tx.get(&self.store, k)? {
Some(cur_v) if blake2sum(&cur_v[..]) == vhash => {
tx.remove(&self.store, k)?;
tx.insert(&self.merkle_todo, k, vec![])?;
if let Some(old_v) = removed {
let old_entry = self.decode_entry(&old_v[..])?;
self.instance.updated(Some(&old_entry), None);
let old_entry = self.decode_entry(&cur_v[..]).map_err(db::TxError::Abort)?;
self.instance.updated(&mut tx, Some(&old_entry), None)?;
Ok(true)
}
_ => Ok(false),
})?;
if removed {
self.metrics.internal_delete_counter.add(1);
self.merkle_todo_notify.notify_one();
Ok(true)
} else {
Ok(false)
}
Ok(removed)
}
// ---- Utility functions ----
@ -315,7 +320,7 @@ where
}
}
pub fn gc_todo_len(&self) -> usize {
self.gc_todo.len()
pub fn gc_todo_len(&self) -> Result<usize, Error> {
Ok(self.gc_todo.len())
}
}

View File

@ -12,9 +12,10 @@ use futures::select;
use futures_util::future::*;
use tokio::sync::watch;
use garage_db::counted_tree_hack::CountedTree;
use garage_util::data::*;
use garage_util::error::*;
use garage_util::sled_counter::SledCountedTree;
use garage_util::time::*;
use garage_rpc::system::System;
@ -100,18 +101,16 @@ where
async fn gc_loop_iter(&self) -> Result<Option<Duration>, Error> {
let now = now_msec();
let mut entries = vec![];
let mut excluded = vec![];
// List entries in the GC todo list
// These entries are put there when a tombstone is inserted in the table
// (see update_entry in data.rs)
for entry_kv in self.data.gc_todo.iter() {
let mut candidates = vec![];
for entry_kv in self.data.gc_todo.iter()? {
let (k, vhash) = entry_kv?;
let mut todo_entry = GcTodoEntry::parse(&k, &vhash);
let todo_entry = GcTodoEntry::parse(&k, &vhash);
if todo_entry.deletion_time() > now {
if entries.is_empty() && excluded.is_empty() {
if candidates.is_empty() {
// If the earliest entry in the todo list shouldn't yet be processed,
// return a duration to wait in the loop
return Ok(Some(Duration::from_millis(
@ -123,15 +122,23 @@ where
}
}
let vhash = Hash::try_from(&vhash[..]).unwrap();
candidates.push(todo_entry);
if candidates.len() >= 2 * TABLE_GC_BATCH_SIZE {
break;
}
}
let mut entries = vec![];
let mut excluded = vec![];
for mut todo_entry in candidates {
// Check if the tombstone is still the current value of the entry.
// If not, we don't actually want to GC it, and we will remove it
// from the gc_todo table later (below).
let vhash = todo_entry.value_hash;
todo_entry.value = self
.data
.store
.get(&k[..])?
.get(&todo_entry.key[..])?
.filter(|v| blake2sum(&v[..]) == vhash)
.map(|v| v.to_vec());
@ -353,17 +360,17 @@ impl GcTodoEntry {
}
/// Parses a GcTodoEntry from a (k, v) pair stored in the gc_todo tree
pub(crate) fn parse(sled_k: &[u8], sled_v: &[u8]) -> Self {
pub(crate) fn parse(db_k: &[u8], db_v: &[u8]) -> Self {
Self {
tombstone_timestamp: u64::from_be_bytes(sled_k[0..8].try_into().unwrap()),
key: sled_k[8..].to_vec(),
value_hash: Hash::try_from(sled_v).unwrap(),
tombstone_timestamp: u64::from_be_bytes(db_k[0..8].try_into().unwrap()),
key: db_k[8..].to_vec(),
value_hash: Hash::try_from(db_v).unwrap(),
value: None,
}
}
/// Saves the GcTodoEntry in the gc_todo tree
pub(crate) fn save(&self, gc_todo_tree: &SledCountedTree) -> Result<(), Error> {
pub(crate) fn save(&self, gc_todo_tree: &CountedTree) -> Result<(), Error> {
gc_todo_tree.insert(self.todo_table_key(), self.value_hash.as_slice())?;
Ok(())
}
@ -373,9 +380,9 @@ impl GcTodoEntry {
/// This is usefull to remove a todo entry only under the condition
/// that it has not changed since the time it was read, i.e.
/// what we have to do is still the same
pub(crate) fn remove_if_equal(&self, gc_todo_tree: &SledCountedTree) -> Result<(), Error> {
let _ = gc_todo_tree.compare_and_swap::<_, _, Vec<u8>>(
&self.todo_table_key()[..],
pub(crate) fn remove_if_equal(&self, gc_todo_tree: &CountedTree) -> Result<(), Error> {
gc_todo_tree.compare_and_swap::<_, _, &[u8]>(
&self.todo_table_key(),
Some(self.value_hash),
None,
)?;

View File

@ -4,11 +4,10 @@ use std::time::Duration;
use futures::select;
use futures_util::future::*;
use serde::{Deserialize, Serialize};
use sled::transaction::{
ConflictableTransactionError, ConflictableTransactionResult, TransactionalTree,
};
use tokio::sync::watch;
use garage_db as db;
use garage_util::background::BackgroundRunner;
use garage_util::data::*;
use garage_util::error::Error;
@ -90,35 +89,35 @@ where
async fn updater_loop(self: Arc<Self>, mut must_exit: watch::Receiver<bool>) {
while !*must_exit.borrow() {
if let Some(x) = self.data.merkle_todo.iter().next() {
match x {
Ok((key, valhash)) => {
if let Err(e) = self.update_item(&key[..], &valhash[..]) {
warn!(
"({}) Error while updating Merkle tree item: {}",
F::TABLE_NAME,
e
);
}
}
Err(e) => {
warn!(
"({}) Error while iterating on Merkle todo tree: {}",
F::TABLE_NAME,
e
);
tokio::time::sleep(Duration::from_secs(10)).await;
match self.updater_loop_iter() {
Ok(true) => (),
Ok(false) => {
select! {
_ = self.data.merkle_todo_notify.notified().fuse() => {},
_ = must_exit.changed().fuse() => {},
}
}
} else {
select! {
_ = self.data.merkle_todo_notify.notified().fuse() => {},
_ = must_exit.changed().fuse() => {},
Err(e) => {
warn!(
"({}) Error while updating Merkle tree item: {}",
F::TABLE_NAME,
e
);
tokio::time::sleep(Duration::from_secs(10)).await;
}
}
}
}
fn updater_loop_iter(&self) -> Result<bool, Error> {
if let Some((key, valhash)) = self.data.merkle_todo.first()? {
self.update_item(&key, &valhash)?;
Ok(true)
} else {
Ok(false)
}
}
fn update_item(&self, k: &[u8], vhash_by: &[u8]) -> Result<(), Error> {
let khash = blake2sum(k);
@ -137,13 +136,16 @@ where
};
self.data
.merkle_tree
.transaction(|tx| self.update_item_rec(tx, k, &khash, &key, new_vhash))?;
.db()
.transaction(|mut tx| self.update_item_rec(&mut tx, k, &khash, &key, new_vhash))?;
let deleted = self
.data
.merkle_todo
.compare_and_swap::<_, _, Vec<u8>>(k, Some(vhash_by), None)?
.is_ok();
let deleted = self.data.merkle_todo.db().transaction(|mut tx| {
let remove = matches!(tx.get(&self.data.merkle_todo, k)?, Some(ov) if ov == vhash_by);
if remove {
tx.remove(&self.data.merkle_todo, k)?;
}
Ok(remove)
})?;
if !deleted {
debug!(
@ -157,12 +159,12 @@ where
fn update_item_rec(
&self,
tx: &TransactionalTree,
tx: &mut db::Transaction<'_>,
k: &[u8],
khash: &Hash,
key: &MerkleNodeKey,
new_vhash: Option<Hash>,
) -> ConflictableTransactionResult<Option<Hash>, Error> {
) -> db::TxResult<Option<Hash>, Error> {
let i = key.prefix.len();
// Read node at current position (defined by the prefix stored in key)
@ -203,7 +205,7 @@ where
}
MerkleNode::Intermediate(_) => Some(MerkleNode::Intermediate(children)),
x @ MerkleNode::Leaf(_, _) => {
tx.remove(key_sub.encode())?;
tx.remove(&self.data.merkle_tree, key_sub.encode())?;
Some(x)
}
}
@ -283,28 +285,27 @@ where
fn read_node_txn(
&self,
tx: &TransactionalTree,
tx: &mut db::Transaction<'_>,
k: &MerkleNodeKey,
) -> ConflictableTransactionResult<MerkleNode, Error> {
let ent = tx.get(k.encode())?;
MerkleNode::decode_opt(ent).map_err(ConflictableTransactionError::Abort)
) -> db::TxResult<MerkleNode, Error> {
let ent = tx.get(&self.data.merkle_tree, k.encode())?;
MerkleNode::decode_opt(&ent).map_err(db::TxError::Abort)
}
fn put_node_txn(
&self,
tx: &TransactionalTree,
tx: &mut db::Transaction<'_>,
k: &MerkleNodeKey,
v: &MerkleNode,
) -> ConflictableTransactionResult<Hash, Error> {
) -> db::TxResult<Hash, Error> {
trace!("Put Merkle node: {:?} => {:?}", k, v);
if *v == MerkleNode::Empty {
tx.remove(k.encode())?;
tx.remove(&self.data.merkle_tree, k.encode())?;
Ok(self.empty_node_hash)
} else {
let vby = rmp_to_vec_all_named(v)
.map_err(|e| ConflictableTransactionError::Abort(e.into()))?;
let vby = rmp_to_vec_all_named(v).map_err(|e| db::TxError::Abort(e.into()))?;
let rethash = blake2sum(&vby[..]);
tx.insert(k.encode(), vby)?;
tx.insert(&self.data.merkle_tree, k.encode(), vby)?;
Ok(rethash)
}
}
@ -312,15 +313,15 @@ where
// Access a node in the Merkle tree, used by the sync protocol
pub(crate) fn read_node(&self, k: &MerkleNodeKey) -> Result<MerkleNode, Error> {
let ent = self.data.merkle_tree.get(k.encode())?;
MerkleNode::decode_opt(ent)
MerkleNode::decode_opt(&ent)
}
pub fn merkle_tree_len(&self) -> usize {
self.data.merkle_tree.len()
pub fn merkle_tree_len(&self) -> Result<usize, Error> {
Ok(self.data.merkle_tree.len()?)
}
pub fn todo_len(&self) -> usize {
self.data.merkle_todo.len()
pub fn todo_len(&self) -> Result<usize, Error> {
Ok(self.data.merkle_todo.len()?)
}
}
@ -347,7 +348,7 @@ impl MerkleNodeKey {
}
impl MerkleNode {
fn decode_opt(ent: Option<sled::IVec>) -> Result<Self, Error> {
fn decode_opt(ent: &Option<db::Value>) -> Result<Self, Error> {
match ent {
None => Ok(MerkleNode::Empty),
Some(v) => Ok(rmp_serde::decode::from_read_ref::<_, MerkleNode>(&v[..])?),

View File

@ -1,6 +1,7 @@
use opentelemetry::{global, metrics::*, KeyValue};
use garage_util::sled_counter::SledCountedTree;
use garage_db as db;
use garage_db::counted_tree_hack::CountedTree;
/// TableMetrics reference all counter used for metrics
pub struct TableMetrics {
@ -19,21 +20,19 @@ pub struct TableMetrics {
pub(crate) sync_items_received: Counter<u64>,
}
impl TableMetrics {
pub fn new(
table_name: &'static str,
merkle_todo: sled::Tree,
gc_todo: SledCountedTree,
) -> Self {
pub fn new(table_name: &'static str, merkle_todo: db::Tree, gc_todo: CountedTree) -> Self {
let meter = global::meter(table_name);
TableMetrics {
_merkle_todo_len: meter
.u64_value_observer(
"table.merkle_updater_todo_queue_length",
move |observer| {
observer.observe(
merkle_todo.len() as u64,
&[KeyValue::new("table_name", table_name)],
)
if let Ok(v) = merkle_todo.len() {
observer.observe(
v as u64,
&[KeyValue::new("table_name", table_name)],
);
}
},
)
.with_description("Merkle tree updater TODO queue length")
@ -45,7 +44,7 @@ impl TableMetrics {
observer.observe(
gc_todo.len() as u64,
&[KeyValue::new("table_name", table_name)],
)
);
},
)
.with_description("Table garbage collector TODO queue length")

View File

@ -1,5 +1,6 @@
use serde::{Deserialize, Serialize};
use garage_db as db;
use garage_util::data::*;
use crate::crdt::Crdt;
@ -82,11 +83,19 @@ pub trait TableSchema: Send + Sync {
None
}
// Updated triggers some stuff downstream, but it is not supposed to block or fail,
// as the update itself is an unchangeable fact that will never go back
// due to CRDT logic. Typically errors in propagation of info should be logged
// to stderr.
fn updated(&self, _old: Option<&Self::E>, _new: Option<&Self::E>) {}
/// Actions triggered by data changing in a table. If such actions
/// include updates to the local database that should be applied
/// atomically with the item update itself, a db transaction is
/// provided on which these changes should be done.
/// This function can return a DB error but that's all.
fn updated(
&self,
_tx: &mut db::Transaction,
_old: Option<&Self::E>,
_new: Option<&Self::E>,
) -> db::TxOpResult<()> {
Ok(())
}
fn matches_filter(entry: &Self::E, filter: &Self::Filter) -> bool;
}

View File

@ -258,9 +258,9 @@ where
while !*must_exit.borrow() {
let mut items = Vec::new();
for item in self.data.store.range(begin.to_vec()..end.to_vec()) {
for item in self.data.store.range(begin.to_vec()..end.to_vec())? {
let (key, value) = item?;
items.push((key.to_vec(), Arc::new(ByteBuf::from(value.as_ref()))));
items.push((key.to_vec(), Arc::new(ByteBuf::from(value))));
if items.len() >= 1024 {
break;
@ -603,8 +603,16 @@ impl SyncTodo {
let retain = nodes.contains(&my_id);
if !retain {
// Check if we have some data to send, otherwise skip
if data.store.range(begin..end).next().is_none() {
continue;
match data.store.range(begin..end) {
Ok(mut iter) => {
if iter.next().is_none() {
continue;
}
}
Err(e) => {
warn!("DB error in add_full_sync: {}", e);
continue;
}
}
}

View File

@ -13,6 +13,8 @@ use opentelemetry::{
Context,
};
use garage_db as db;
use garage_util::data::*;
use garage_util::error::Error;
use garage_util::metrics::RecordDuration;
@ -69,7 +71,7 @@ where
{
// =============== PUBLIC INTERFACE FUNCTIONS (new, insert, get, etc) ===============
pub fn new(instance: F, replication: R, system: Arc<System>, db: &sled::Db) -> Arc<Self> {
pub fn new(instance: F, replication: R, system: Arc<System>, db: &db::Db) -> Arc<Self> {
let endpoint = system
.netapp
.endpoint(format!("garage_table/table.rs/Rpc:{}", F::TABLE_NAME));

View File

@ -14,6 +14,8 @@ path = "lib.rs"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
garage_db = { version = "0.8.0", path = "../db" }
blake2 = "0.9"
err-derive = "0.3"
xxhash-rust = { version = "0.8", default-features = false, features = ["xxh3"] }
@ -22,8 +24,6 @@ tracing = "0.1.30"
rand = "0.8"
sha2 = "0.9"
sled = "0.34"
chrono = "0.4"
rmp-serde = "0.15"
serde = { version = "1.0", default-features = false, features = ["derive", "rc"] }

View File

@ -64,14 +64,19 @@ pub struct Config {
#[serde(default)]
pub kubernetes_skip_crd: bool,
// -- DB
/// Database engine to use for metadata (options: sled, sqlite, lmdb)
#[serde(default = "default_db_engine")]
pub db_engine: String,
/// Sled cache size, in bytes
#[serde(default = "default_sled_cache_capacity")]
pub sled_cache_capacity: u64,
/// Sled flush interval in milliseconds
#[serde(default = "default_sled_flush_every_ms")]
pub sled_flush_every_ms: u64,
// -- APIs
/// Configuration for S3 api
pub s3_api: S3ApiConfig,
@ -129,6 +134,10 @@ pub struct AdminConfig {
pub trace_sink: Option<String>,
}
fn default_db_engine() -> String {
"sled".into()
}
fn default_sled_cache_capacity() -> u64 {
128 * 1024 * 1024
}

View File

@ -26,8 +26,8 @@ pub enum Error {
#[error(display = "Netapp error: {}", _0)]
Netapp(#[error(source)] netapp::error::Error),
#[error(display = "Sled error: {}", _0)]
Sled(#[error(source)] sled::Error),
#[error(display = "DB error: {}", _0)]
Db(#[error(source)] garage_db::Error),
#[error(display = "Messagepack encode error: {}", _0)]
RmpEncode(#[error(source)] rmp_serde::encode::Error),
@ -78,11 +78,11 @@ impl Error {
}
}
impl From<sled::transaction::TransactionError<Error>> for Error {
fn from(e: sled::transaction::TransactionError<Error>) -> Error {
impl From<garage_db::TxError<Error>> for Error {
fn from(e: garage_db::TxError<Error>) -> Error {
match e {
sled::transaction::TransactionError::Abort(x) => x,
sled::transaction::TransactionError::Storage(x) => Error::Sled(x),
garage_db::TxError::Abort(x) => x,
garage_db::TxError::Db(x) => Error::Db(x),
}
}
}

View File

@ -11,7 +11,7 @@ pub mod error;
pub mod formater;
pub mod metrics;
pub mod persister;
pub mod sled_counter;
//pub mod sled_counter;
pub mod time;
pub mod token_bucket;
pub mod tranquilizer;

View File

@ -1,100 +0,0 @@
use std::sync::{
atomic::{AtomicUsize, Ordering},
Arc,
};
use sled::{CompareAndSwapError, IVec, Iter, Result, Tree};
#[derive(Clone)]
pub struct SledCountedTree(Arc<SledCountedTreeInternal>);
struct SledCountedTreeInternal {
tree: Tree,
len: AtomicUsize,
}
impl SledCountedTree {
pub fn new(tree: Tree) -> Self {
let len = tree.len();
Self(Arc::new(SledCountedTreeInternal {
tree,
len: AtomicUsize::new(len),
}))
}
pub fn len(&self) -> usize {
self.0.len.load(Ordering::Relaxed)
}
pub fn is_empty(&self) -> bool {
self.0.tree.is_empty()
}
pub fn get<K: AsRef<[u8]>>(&self, key: K) -> Result<Option<IVec>> {
self.0.tree.get(key)
}
pub fn iter(&self) -> Iter {
self.0.tree.iter()
}
// ---- writing functions ----
pub fn insert<K, V>(&self, key: K, value: V) -> Result<Option<IVec>>
where
K: AsRef<[u8]>,
V: Into<IVec>,
{
let res = self.0.tree.insert(key, value);
if res == Ok(None) {
self.0.len.fetch_add(1, Ordering::Relaxed);
}
res
}
pub fn remove<K: AsRef<[u8]>>(&self, key: K) -> Result<Option<IVec>> {
let res = self.0.tree.remove(key);
if matches!(res, Ok(Some(_))) {
self.0.len.fetch_sub(1, Ordering::Relaxed);
}
res
}
pub fn pop_min(&self) -> Result<Option<(IVec, IVec)>> {
let res = self.0.tree.pop_min();
if let Ok(Some(_)) = &res {
self.0.len.fetch_sub(1, Ordering::Relaxed);
};
res
}
pub fn compare_and_swap<K, OV, NV>(
&self,
key: K,
old: Option<OV>,
new: Option<NV>,
) -> Result<std::result::Result<(), CompareAndSwapError>>
where
K: AsRef<[u8]>,
OV: AsRef<[u8]>,
NV: Into<IVec>,
{
let old_some = old.is_some();
let new_some = new.is_some();
let res = self.0.tree.compare_and_swap(key, old, new);
if res == Ok(Ok(())) {
match (old_some, new_some) {
(false, true) => {
self.0.len.fetch_add(1, Ordering::Relaxed);
}
(true, false) => {
self.0.len.fetch_sub(1, Ordering::Relaxed);
}
_ => (),
}
}
res
}
}