use netapp streaming body #343

Merged
lx merged 31 commits from netapp-stream-body into main 2022-09-13 13:26:09 +00:00
Owner

TODO:

  • Test OrderTag works (check trace-level logs in integration test)
  • Publish netapp 0.5.0 and use that as a dependency (revert change to .nix to pull from git repo)
  • In get with range, use streaming also (use StreamExt::scan for slicing)
TODO: - [x] Test OrderTag works (check trace-level logs in integration test) - [x] Publish netapp 0.5.0 and use that as a dependency (revert change to .nix to pull from git repo) - [x] In get with range, use streaming also (use StreamExt::scan for slicing)
lx force-pushed netapp-stream-body from 433cbd65d1 to 6a78c0715c 2022-07-22 14:46:02 +00:00 Compare
lx changed target branch from main to lx-perf-improvements 2022-07-22 14:46:11 +00:00
lx force-pushed netapp-stream-body from d888c9c193 to fe5dadb756 2022-07-22 17:03:16 +00:00 Compare
lx force-pushed netapp-stream-body from 9c1889c630 to f728893dae 2022-07-25 10:05:19 +00:00 Compare
lx force-pushed netapp-stream-body from f728893dae to 326d418367 2022-07-25 10:06:52 +00:00 Compare
Author
Owner

Currently stalled as there is an issue I'm unable to fix. test-smoke.sh pretty consistently reproduces the issue: transfers (GetObject requests) get stuck in the middle. I don't understand exactly what is going on, but it looks like entire netapp connections are blocked as pings start timing out, it's not just an issue with a stream that ends prematurely.

Lines 188 and 229 in block/manager.rs need to be commented for the bug to happen: when these lines are commented, nodes won't priorize reading block from local storage and will instead ask remote nodes most of the time. This is the condition under which the issue happens. (when request priorization is enabled, nodes in test-smoke will always read locally so the bug doesn't happen).

I don't want to spend too much time on this, merging this is not a high priority.

Next steps: ??? Maybe try to reproduce the issue with a simpler netapp program, and not an entire Garage (here there are too many connections open at once and we can't really see what is happening)

Currently stalled as there is an issue I'm unable to fix. `test-smoke.sh` pretty consistently reproduces the issue: transfers (GetObject requests) get stuck in the middle. I don't understand exactly what is going on, but it looks like entire netapp connections are blocked as pings start timing out, it's not just an issue with a stream that ends prematurely. Lines 188 and 229 in `block/manager.rs` need to be commented for the bug to happen: when these lines are commented, nodes won't priorize reading block from local storage and will instead ask remote nodes most of the time. This is the condition under which the issue happens. (when request priorization is enabled, nodes in test-smoke will always read locally so the bug doesn't happen). I don't want to spend too much time on this, merging this is not a high priority. Next steps: ??? Maybe try to reproduce the issue with a simpler netapp program, and not an entire Garage (here there are too many connections open at once and we can't really see what is happening)
lx force-pushed netapp-stream-body from a1c224e2e8 to e935861854 2022-07-29 10:25:11 +00:00 Compare
lx added 5 commits 2022-08-29 14:45:10 +00:00
drone: set TARGET env as needed by "to_s3" func
All checks were successful
continuous-integration/drone Build is passing
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
8cd02639dc
Configure structopt to report the right version
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/tag Build is passing
continuous-integration/drone Build is passing
continuous-integration/drone/push Build is passing
2c7bae935a
By default, structopt reports the value provided by
the env var CARGO_PKG_VERSION, feeded by Cargo when reading
Cargo.toml. However for Garage we use a versioning based on git,
so we often report a version that is behind the real version.
In this commit, we create garage_util::version::garage() that
reports the right version and configure all structopt subcommands
to call this function instead of using the env var.
Add some documentation for Caddy
Some checks are pending
continuous-integration/drone/pr Build is passing
continuous-integration/drone Build is passing
continuous-integration/drone/push Build is pending
532eca7ff9
Merge branch 'main' into lx-perf-improvements
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
ebc20a8798
Merge branch 'lx-perf-improvements' into netapp-stream-body
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
1921f4f7e6
lx added 2 commits 2022-08-29 14:48:48 +00:00
Update drone signature
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
continuous-integration/drone Build is passing
4da67b0035
Merge branch 'lx-perf-improvements' into netapp-stream-body
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
continuous-integration/drone Build is failing
52749e28f7
lx added 1 commit 2022-08-29 15:25:05 +00:00
cargo2nix fix to fetchCrateGit
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
5d065b8a0f
lx added 1 commit 2022-08-29 15:32:54 +00:00
Try to fix clippy
Some checks reported errors
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build was killed
continuous-integration/drone Build was killed
322dafc761
lx added 5 commits 2022-08-31 15:42:37 +00:00
Replace logging crate pretty_env_logger by tracing_subscriber::fmt
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
dd5304f6fc
Tracing-subscriber: write to stderr
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
44cd98d2e4
Add env filter to tracing subscriber
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
efbca67ce4
update cargo.nix
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
eb97e13a6a
Merge branch 'lx-perf-improvements' into netapp-stream-body
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
c9bc9d89de
lx added 1 commit 2022-08-31 17:27:36 +00:00
update netapp git commit
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
e598231ca4
lx added 1 commit 2022-08-31 17:44:38 +00:00
Fix bytes_read counter
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
70231d68b2
lx added 1 commit 2022-09-01 07:47:45 +00:00
netapp recv with unbounded channel removes deadlock
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
4b726b0941
lx added 1 commit 2022-09-01 10:58:46 +00:00
Update to Netapp with OrderTag support and exploit OrderTags
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
bc977f9a7a
lx added 1 commit 2022-09-01 12:24:05 +00:00
update netapp: straming + fix-ping
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
f3bf34b6a1
lx added 2 commits 2022-09-01 14:31:13 +00:00
update cargo.nix
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
e648bf7b69
lx added 1 commit 2022-09-01 14:35:53 +00:00
Apply PRIO_SECONDARY to block data transfers
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
99b532b85b
lx added 1 commit 2022-09-02 11:38:45 +00:00
cargo fmt
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
1ef87ac4cb
lx added 1 commit 2022-09-02 11:46:55 +00:00
Make use of BytesBuf from new Netapp
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
13b5f28c7e
lx added 1 commit 2022-09-06 17:31:54 +00:00
Reenable node ordering
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is passing
c2cc08852b
lx added 1 commit 2022-09-06 17:45:16 +00:00
Update netapp to lastest git version with LAS scheduling
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is passing
4024822585
lx added 12 commits 2022-09-06 20:13:21 +00:00
Update to netapp 0.4.5 - fixed ping
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
6226f5ceca
Ability to dynamically set resync tranquility
All checks were successful
continuous-integration/drone/push Build is passing
943d76c583
block manager: refactor: split resync into separate file
All checks were successful
continuous-integration/drone/push Build is passing
47be652a1f
Make BlockManagerLocked fully private again
All checks were successful
continuous-integration/drone/push Build is passing
5e8baa433d
Ability to have up to 4 concurrently working resync workers
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
5d4b937a00
fix clippy
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
e1751c8a9c
Included in this PR:

- [x] Small refactor, resync code is moved to a separate `block/resync.rs` file
- [x] Block resync tranquility is no longer in config file, it is set dynamically using `garage worker set resync-tranquility` (this parameter is persisted over Garage restarts)
- [x] Up to 4 block resync workers can be activated to run simultaneously to speed up big resyncs, this parameter is set dynamically using `garage worker set resync-n-workers`

Reviewed-on: #369
Merge branch 'main' into lx-perf-improvements
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
07e6bcde85
Update .drone.yml signature
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
fd8074ad9b
Merge pull request 'Update .drone.yml signature' (#374) from fix-drone-signature into main
All checks were successful
continuous-integration/drone/push Build is passing
9f5433db82
Reviewed-on: #374
Merge branch 'main' into lx-perf-improvements
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
d23b3a14fc
Merge branch 'lx-perf-improvements' into netapp-stream-body
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
6b958979bd
lx added 1 commit 2022-09-06 20:25:41 +00:00
Faster copy, better get error message
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
907054775d
lx added 31 commits 2022-09-08 13:51:01 +00:00
If this feature is enabled, libsodium-sys and zstd-sys will link
dynamically against system-provided libraries instead of building
and linking statically the bundled (possibly outdated and vulnerable)
copies of them. This feature is intended mainly for linux package
maintainers.
Allow linking against system-provided libsqlite
Some checks are pending
continuous-integration/drone/push Build is pending
continuous-integration/drone/pr Build is pending
7511ba5530
Unfortunately, rusqlite uses the opposite logic for enabling/disabling
bundled libraries to others (libsodium-sys, zstd-sys). Cargo features
are very limited and doesn't allow to enable feature A in a dependency
iff feature B is disabled.

Note, lmdb-rkv-sys doesn't need any special treatment because it
automatically links against system liblmdb if found via pkgconf.

Linux distros should build garage with
`--no-default-features --features system-libs` to disable bundled-libs
and enable system-libs.
Remove Heed default features
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
729a910e14
Garage currently uses the legacy resolver "1". The new one is used
by default if the root package specifies 'edition = 2021', which
Garage does not (yet).

The problem with the legacy resolver is, among others, that features
enabled by dev-dependencies are propagated to normal dependencies.
This affects e.g. hyper - one of the dev-dependencies enables "http2"
feature that adds many extra dependencies. If we build garage without
opentelemetry-otlp (this is enabled in the following commit), there's
no normal dependency enabling "http2" feature.

See https://doc.rust-lang.org/cargo/reference/resolver.html#feature-resolver-version-2
opentelemetry-otlp add 48 (!) extra dependencies and increases the
size of the garage binary by ~11 % (with fat LTO).
Allow building without Prometheus exporter (/metrics endpoint)
Some checks failed
continuous-integration/drone/pr Build is failing
ea36b9ff90
prometheus and opentelemetry-prometheus add 7 extra dependencies in
total and increases the size of the garage binary by ~7 % (with
fat LTO).
Reviewed-on: #372
Reviewed-by: Alex <alex@adnab.me>
Update .nix files
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
8d77a76df1
Force disable pkg-config for libsodum-sys and libzstd-sys
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
7de53a4d66
Bump versions to 0.8.0 (compatibility is broken already)
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is failing
48ffaaadfc
Document available build features
Some checks are pending
continuous-integration/drone/push Build is pending
continuous-integration/drone/pr Build is pending
bbb970965c
Update Nix files with optional db engines
Some checks are pending
continuous-integration/drone/push Build is pending
continuous-integration/drone/pr Build is pending
2c2b93acdf
Remove opentelemetry-otlp dep in api/
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
431dee050f
Disable k2v tests when feature is disabled
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
1e92e9f782
Include code from v0.5.1 directly to remove dependencies
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
0f5689c169
cargo fmt
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
6f02c36a89
Move GIT_VERSION injection later in build chain to reduce build times
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is passing
db61f41030
Report build features in garage --help
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is passing
28d86e7602
Make all HTTP services optionnal
Some checks are pending
continuous-integration/drone/push Build is pending
continuous-integration/drone/pr Build is pending
2559f63e9b
Error messages when system-libs XOR bundled-libs != 1
Some checks are pending
continuous-integration/drone/push Build is pending
continuous-integration/drone/pr Build is pending
2e00809af5
Add warnings when features are not included in build
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone/pr Build is failing
1449204439
Fix build error
Some checks failed
continuous-integration/drone/pr Build is failing
continuous-integration/drone/push Build is passing
107853334b
Fix merge
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
06df301de5
Inject GIT_VERSION even later
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
f310fce34b
Move version back into util
All checks were successful
continuous-integration/drone/pr Build is passing
continuous-integration/drone/push Build is passing
ceb1f0229a
Merge pull request 'Reorganize dependencies' (#373) from improve-deps into main
Some checks reported errors
continuous-integration/drone/push Build was killed
03c40a0b24
This PR includes work from @jirutka :

- [x] Allow linking against system-provided libraries (libsodium, libsqlite, libzstd) #370
- [x] Make OTLP exporter optional and allow building without Prometheus exporter (/metrics) #372

And also:

- [x] Update `.nix` files
- [x] Remove heed default-features
- [x] Bump versions of all Garage crates to 0.8.0
- [x] Make db engines (lmdb, sled, sqlite) optionnal
- [x] Add documentation for available features
- [x] Directly include code of previous versions used for migration in order to reduce dependencies
- [x] Read variable `GIT_VERSION` from garage main instead of in crate garage_util to make builds faster
- [x] Report features used in the build somewhere? (in `garage --version` or something)
- [x] Check we `warn!` correctly if we try to use deactivated feature
- [x] Allow not to launch S3 endpoint if not in config

Reviewed-on: #373
Merge branch 'main' into lx-perf-improvements
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
d9d199a6c9
Merge branch 'lx-perf-improvements' into netapp-stream-body
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
7f54706b95
lx changed target branch from lx-perf-improvements to main 2022-09-12 14:38:57 +00:00
lx added 1 commit 2022-09-12 14:57:55 +00:00
improvements in block manager
All checks were successful
continuous-integration/drone/push Build is passing
b823151a0b
lx added 1 commit 2022-09-13 11:12:03 +00:00
Use netapp 0.5 published from crates.io
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
28a4af73ca
lx changed title from WIP: use netapp streaming body to use netapp streaming body 2022-09-13 12:40:56 +00:00
lx added 1 commit 2022-09-13 13:13:22 +00:00
Use streaming block API for get with Range requests
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
ff30891999
lx merged commit 11bdc971e2 into main 2022-09-13 13:26:09 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#343
No description provided.