Not all workers quit on time — how can I troubleshoot? #676

Closed
opened 2023-12-03 04:19:35 +00:00 by wilson · 3 comments
Contributor

I'm running garage on ZFS in NixOS via systemd and noticing several things:

  • logs on shutdown indicate that process tree did not quit completely
  • one of my CPU threads is stuck at 100%, consumed by garage

I'm not sure when it started, but this initially happened on v0.8.2, despite it running for months without any problems. I thought it was a problem that would potentially be fixed with upgrade, so I upgraded to v0.9.0 and the problem persists.

Attached at the end of this post is the complete log, but I thought the interesting bit is here:

ERROR garage_util::background::worker: Some workers could not exit in time, we are cancelling some things in the middle

I'm not sure how to troubleshoot this since I can't tell which worker didn't manage to exit. How do I proceed on chasing this bug? Any pointers would be appreciated!

System information
❯ uname -a
Linux awan 6.6.3 #1-NixOS SMP PREEMPT_DYNAMIC Tue Nov 28 17:20:18 UTC 2023 x86_64 GNU/Linux

❯ zfs --version
zfs-2.2.1-1
zfs-kmod-2.2.1-1

❯ garage --version
garage cargo:0.9.0 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs]
Garage configuration
bootstrap_peers = []
data_dir = "/var/lib/garage/data"
data_fsync = true
db_engine = "lmdb"
lmdb_map_size = "64GiB"
metadata_dir = "/var/lib/garage/config"
metadata_fsync = true
replication_mode = "1"
rpc_bind_addr = "[::]:9900"
rpc_public_addr = "garage.xargs.dev:9900"
rpc_secret = "s3cr3t"
[admin]
admin_secret = "s3cr3t"
api_bind_addr = "[::]:9901"

[s3_api]
api_bind_addr = "[::]:9902"
root_domain = ".s3.xargs.dev"
s3_region = "garage"

[s3_web]
bind_addr = "[::]:9903"
root_domain = ".garage.xargs.dev"
Garage logs
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048442Z  INFO garage::server: Loading configuration...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048603Z  INFO garage::server: Initializing Garage main data store...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048635Z  INFO garage_model::garage: Opening database...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048639Z  INFO garage_model::garage: Opening LMDB database at: /var/lib/garage/config/db.lmdb
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048782Z  INFO garage_model::garage: Initialize background variable system...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048787Z  INFO garage_model::garage: Initialize membership management system...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048812Z  INFO garage_rpc::system: Node ID of this node: f0d1ea8d37bd5fc5
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070447Z  INFO garage_model::garage: Initialize block manager...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070845Z  INFO garage_model::garage: Initialize bucket_table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070913Z  INFO garage_model::garage: Initialize bucket_alias_table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070939Z  INFO garage_model::garage: Initialize key_table_table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070970Z  INFO garage_model::garage: Initialize block_ref_table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070996Z  INFO garage_model::garage: Initialize version_table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071037Z  INFO garage_model::garage: Initialize multipart upload counter table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071079Z  INFO garage_model::garage: Initialize multipart upload table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071108Z  INFO garage_model::garage: Initialize object counter table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071136Z  INFO garage_model::garage: Initialize object_table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071164Z  INFO garage_model::garage: Load lifecycle worker state...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071192Z  INFO garage_model::garage: Initialize K2V counter table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071222Z  INFO garage_model::garage: Initialize K2V subscription manager...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071226Z  INFO garage_model::garage: Initialize K2V item table...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071257Z  INFO garage_model::garage: Initialize K2V RPC handler...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071266Z  INFO garage::server: Initializing background runner...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071290Z  INFO garage::server: Spawning Garage workers...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071335Z  INFO garage::server: Initialize Admin API server and metrics collector...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071342Z  INFO garage::server: Launching internal Garage cluster communications...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071363Z  INFO garage::server: Create admin RPC handler...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071377Z  INFO garage::server: Initializing S3 API server...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071385Z  INFO garage::server: Initializing web server...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071391Z  INFO garage::server: Launching Admin API server...
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071412Z  INFO garage_api::generic_server: S3 API server listening on http://[::]:9902
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071445Z  INFO garage_web::web_server: Web server listening on http://[::]:9903
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071524Z  INFO garage_api::generic_server: Admin API server listening on http://[::]:9901
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071960Z  INFO garage_table::gc: (version) GC: 7 items successfully pushed, will try to delete.
Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.087584Z  INFO garage_table::gc: (block_ref) GC: 7 items successfully pushed, will try to delete.
Dec 02 17:52:23 awan garage[23301]: 2023-12-03T01:52:23.351008Z  INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics
Dec 02 17:52:23 awan garage[23301]: 2023-12-03T01:52:23.435630Z ERROR garage_util::background::worker: Error in worker Block scrub worker (TID 9): IO error: Input/output error (os error 5)
Dec 02 17:52:28 awan garage[23301]: 2023-12-03T01:52:28.350632Z  INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics
Dec 02 17:52:33 awan garage[23301]: 2023-12-03T01:52:33.350773Z  INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics

# cut multiple lines of GET /metrics

Dec 02 17:55:43 awan garage[23301]: 2023-12-03T01:55:43.350207Z  INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics
Dec 02 17:55:48 awan garage[23301]: 2023-12-03T01:55:48.350258Z  INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688586Z  INFO garage::server: Received SIGTERM, shutting down.
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688644Z  INFO garage_util::background::worker: Worker k2v_item sync (TID 48) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688661Z  INFO garage_util::background::worker: Worker object lifecycle worker (TID 46) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688668Z  INFO garage_util::background::worker: Worker object GC (TID 24) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688668Z  INFO garage::server: S3 API server exited without error.
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688674Z  INFO garage_util::background::worker: Worker key Merkle (TID 18) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688691Z  INFO garage_util::background::worker: Worker multipart_upload GC (TID 32) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688685Z  INFO garage::server: Web server exited without error.
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688696Z  INFO garage_util::background::worker: Worker object Merkle (TID 22) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688702Z  INFO garage_util::background::worker: Worker bucket_mpu_counter queue (TID 37) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688706Z  INFO garage_util::background::worker: Worker multipart_upload queue (TID 33) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688710Z  INFO garage_util::background::worker: Worker bucket_v2 queue (TID 13) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688716Z  INFO garage_util::background::worker: Worker multipart_upload Merkle (TID 30) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688721Z  INFO garage_util::background::worker: Worker bucket_object_counter Merkle (TID 26) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688725Z  INFO garage_util::background::worker: Worker k2v_index_counter_v2 GC (TID 53) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688730Z  INFO garage_util::background::worker: Worker bucket_v2 GC (TID 12) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688734Z  INFO garage_util::background::worker: Worker block_ref GC (TID 44) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688737Z  INFO garage::server: Admin server exited without error.
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688740Z  INFO garage_util::background::worker: Worker key sync (TID 19) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688756Z  INFO garage_util::background::worker: Worker bucket_alias queue (TID 17) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688751Z  INFO garage::server: Deregistering RPC handlers for shutdown...
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688762Z  INFO garage_util::background::worker: Worker Block resync worker #7 (TID 7) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688767Z  INFO garage_util::background::worker: Worker k2v_index_counter_v2 Merkle (TID 51) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688773Z  INFO garage_util::background::worker: Worker k2v_index_counter_v2 sync (TID 52) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688777Z  INFO garage_util::background::worker: Worker block_ref queue (TID 45) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688781Z  INFO garage_util::background::worker: Worker Block resync worker #4 (TID 4) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688785Z  INFO garage_util::background::worker: Worker bucket_mpu_counter Merkle (TID 34) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688790Z  INFO garage_util::background::worker: Worker k2v_item Merkle (TID 47) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688794Z  INFO garage_util::background::worker: Worker bucket_v2 Merkle (TID 10) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688799Z  INFO garage_util::background::worker: Worker bucket_object_counter GC (TID 28) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688802Z  INFO garage_util::background::worker: Worker key GC (TID 20) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688808Z  INFO garage_util::background::worker: Worker bucket_object_counter queue (TID 29) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688813Z  INFO garage_util::background::worker: Worker block_ref sync (TID 43) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688818Z  INFO garage_util::background::worker: Worker bucket_mpu_counter sync (TID 35) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688823Z  INFO garage_util::background::worker: Worker object sync (TID 23) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688828Z  INFO garage_util::background::worker: Worker block_ref Merkle (TID 42) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688833Z  INFO garage_util::background::worker: Worker bucket_alias Merkle (TID 14) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688837Z  INFO garage_util::background::worker: Worker bucket_mpu_counter GC (TID 36) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688841Z  INFO garage_util::background::worker: Worker bucket_alias GC (TID 16) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688845Z  INFO garage_util::background::worker: Worker bucket_alias sync (TID 15) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688865Z  INFO garage_util::background::worker: Worker k2v_index_counter_v2 queue (TID 54) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688869Z  INFO garage_util::background::worker: Worker k2v_item queue (TID 50) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688873Z  INFO garage_util::background::worker: Worker Block resync worker #8 (TID 8) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688877Z  INFO garage_util::background::worker: Worker Block resync worker #2 (TID 2) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688882Z  INFO garage_util::background::worker: Worker version Merkle (TID 38) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688885Z  INFO garage_util::background::worker: Worker k2v_item GC (TID 49) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688890Z  INFO garage_util::background::worker: Worker multipart_upload sync (TID 31) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688895Z  INFO garage_util::background::worker: Worker bucket_object_counter sync (TID 27) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688900Z  INFO garage_util::background::worker: Worker bucket_v2 sync (TID 11) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688904Z  INFO garage_util::background::worker: Worker key queue (TID 21) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688908Z  INFO garage_util::background::worker: Worker Block resync worker #6 (TID 6) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688912Z  INFO garage_util::background::worker: Worker Block resync worker #5 (TID 5) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688916Z  INFO garage_util::background::worker: Worker version queue (TID 41) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688921Z  INFO garage_util::background::worker: Worker version sync (TID 39) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688924Z  INFO garage_util::background::worker: Worker object queue (TID 25) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688928Z  INFO garage_util::background::worker: Worker Block resync worker #3 (TID 3) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688933Z  INFO garage_util::background::worker: Worker version GC (TID 40) exited (last state: Idle)
Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.690529Z  INFO garage_util::background::worker: Worker Block scrub worker (TID 9) exited (last state: Throttled(0.14840554))
Dec 02 17:55:51 awan garage[23301]: 2023-12-03T01:55:51.294332Z  INFO garage::server: Netapp exited
Dec 02 17:55:58 awan garage[23301]: 2023-12-03T01:55:58.690778Z ERROR garage_util::background::worker: Some workers could not exit in time, we are cancelling some things in the middle
Dec 02 17:55:58 awan garage[23301]: 2023-12-03T01:55:58.690831Z  INFO garage::server: Cleaning up...
Dec 02 17:57:20 awan systemd[1]: garage.service: State 'stop-sigterm' timed out. Killing.
Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 23301 (garage) with signal SIGKILL.
Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 15262 (garage) with signal SIGKILL.
Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 15350 (tokio-runtime-w) with signal SIGKILL.
Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 23385 (tokio-runtime-w) with signal SIGKILL.
Dec 02 17:58:51 awan systemd[1]: garage.service: Processes still around after SIGKILL. Ignoring.
Dec 02 18:00:21 awan systemd[1]: garage.service: State 'final-sigterm' timed out. Killing.
Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 23301 (garage) with signal SIGKILL.
Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 15262 (garage) with signal SIGKILL.
Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 15350 (tokio-runtime-w) with signal SIGKILL.
Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 23385 (tokio-runtime-w) with signal SIGKILL.
Dec 02 18:01:51 awan systemd[1]: garage.service: Processes still around after final SIGKILL. Entering failed mode.
Dec 02 18:01:51 awan systemd[1]: garage.service: Failed with result 'timeout'.
Dec 02 18:01:51 awan systemd[1]: garage.service: Unit process 15262 (garage) remains running after unit stopped.
Dec 02 18:01:51 awan systemd[1]: garage.service: Unit process 23301 (garage) remains running after unit stopped.
Dec 02 18:01:51 awan systemd[1]: Stopped Garage Object Storage (S3 compatible).
Dec 02 18:01:51 awan systemd[1]: garage.service: Consumed 9min 30.893s CPU time, read 390.4M from disk, written 88.0K to disk, received 43.4K IP traffic, sent 759.8K IP traffic.
I'm running garage on ZFS in NixOS via systemd and noticing several things: - logs on shutdown indicate that process tree did not quit completely - one of my CPU threads is stuck at 100%, consumed by garage I'm not sure when it started, but this initially happened on v0.8.2, despite it running for months without any problems. I thought it was a problem that would potentially be fixed with upgrade, so I upgraded to v0.9.0 and the problem persists. Attached at the end of this post is the complete log, but I thought the interesting bit is here: ``` ERROR garage_util::background::worker: Some workers could not exit in time, we are cancelling some things in the middle ``` I'm not sure how to troubleshoot this since I can't tell which worker didn't manage to exit. How do I proceed on chasing this bug? Any pointers would be appreciated! <details> <summary>System information</summary> ```shellsession ❯ uname -a Linux awan 6.6.3 #1-NixOS SMP PREEMPT_DYNAMIC Tue Nov 28 17:20:18 UTC 2023 x86_64 GNU/Linux ❯ zfs --version zfs-2.2.1-1 zfs-kmod-2.2.1-1 ❯ garage --version garage cargo:0.9.0 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs] ``` </details> <details> <summary>Garage configuration</summary> ```toml bootstrap_peers = [] data_dir = "/var/lib/garage/data" data_fsync = true db_engine = "lmdb" lmdb_map_size = "64GiB" metadata_dir = "/var/lib/garage/config" metadata_fsync = true replication_mode = "1" rpc_bind_addr = "[::]:9900" rpc_public_addr = "garage.xargs.dev:9900" rpc_secret = "s3cr3t" [admin] admin_secret = "s3cr3t" api_bind_addr = "[::]:9901" [s3_api] api_bind_addr = "[::]:9902" root_domain = ".s3.xargs.dev" s3_region = "garage" [s3_web] bind_addr = "[::]:9903" root_domain = ".garage.xargs.dev" ``` </details> <details> <summary>Garage logs</summary> ```log Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048442Z INFO garage::server: Loading configuration... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048603Z INFO garage::server: Initializing Garage main data store... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048635Z INFO garage_model::garage: Opening database... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048639Z INFO garage_model::garage: Opening LMDB database at: /var/lib/garage/config/db.lmdb Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048782Z INFO garage_model::garage: Initialize background variable system... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048787Z INFO garage_model::garage: Initialize membership management system... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.048812Z INFO garage_rpc::system: Node ID of this node: f0d1ea8d37bd5fc5 Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070447Z INFO garage_model::garage: Initialize block manager... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070845Z INFO garage_model::garage: Initialize bucket_table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070913Z INFO garage_model::garage: Initialize bucket_alias_table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070939Z INFO garage_model::garage: Initialize key_table_table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070970Z INFO garage_model::garage: Initialize block_ref_table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.070996Z INFO garage_model::garage: Initialize version_table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071037Z INFO garage_model::garage: Initialize multipart upload counter table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071079Z INFO garage_model::garage: Initialize multipart upload table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071108Z INFO garage_model::garage: Initialize object counter table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071136Z INFO garage_model::garage: Initialize object_table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071164Z INFO garage_model::garage: Load lifecycle worker state... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071192Z INFO garage_model::garage: Initialize K2V counter table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071222Z INFO garage_model::garage: Initialize K2V subscription manager... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071226Z INFO garage_model::garage: Initialize K2V item table... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071257Z INFO garage_model::garage: Initialize K2V RPC handler... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071266Z INFO garage::server: Initializing background runner... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071290Z INFO garage::server: Spawning Garage workers... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071335Z INFO garage::server: Initialize Admin API server and metrics collector... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071342Z INFO garage::server: Launching internal Garage cluster communications... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071363Z INFO garage::server: Create admin RPC handler... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071377Z INFO garage::server: Initializing S3 API server... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071385Z INFO garage::server: Initializing web server... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071391Z INFO garage::server: Launching Admin API server... Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071412Z INFO garage_api::generic_server: S3 API server listening on http://[::]:9902 Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071445Z INFO garage_web::web_server: Web server listening on http://[::]:9903 Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071524Z INFO garage_api::generic_server: Admin API server listening on http://[::]:9901 Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.071960Z INFO garage_table::gc: (version) GC: 7 items successfully pushed, will try to delete. Dec 02 17:52:22 awan garage[23301]: 2023-12-03T01:52:22.087584Z INFO garage_table::gc: (block_ref) GC: 7 items successfully pushed, will try to delete. Dec 02 17:52:23 awan garage[23301]: 2023-12-03T01:52:23.351008Z INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics Dec 02 17:52:23 awan garage[23301]: 2023-12-03T01:52:23.435630Z ERROR garage_util::background::worker: Error in worker Block scrub worker (TID 9): IO error: Input/output error (os error 5) Dec 02 17:52:28 awan garage[23301]: 2023-12-03T01:52:28.350632Z INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics Dec 02 17:52:33 awan garage[23301]: 2023-12-03T01:52:33.350773Z INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics # cut multiple lines of GET /metrics Dec 02 17:55:43 awan garage[23301]: 2023-12-03T01:55:43.350207Z INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics Dec 02 17:55:48 awan garage[23301]: 2023-12-03T01:55:48.350258Z INFO garage_api::generic_server: [::ffff:172.16.0.91]:54658 GET /metrics Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688586Z INFO garage::server: Received SIGTERM, shutting down. Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688644Z INFO garage_util::background::worker: Worker k2v_item sync (TID 48) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688661Z INFO garage_util::background::worker: Worker object lifecycle worker (TID 46) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688668Z INFO garage_util::background::worker: Worker object GC (TID 24) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688668Z INFO garage::server: S3 API server exited without error. Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688674Z INFO garage_util::background::worker: Worker key Merkle (TID 18) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688691Z INFO garage_util::background::worker: Worker multipart_upload GC (TID 32) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688685Z INFO garage::server: Web server exited without error. Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688696Z INFO garage_util::background::worker: Worker object Merkle (TID 22) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688702Z INFO garage_util::background::worker: Worker bucket_mpu_counter queue (TID 37) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688706Z INFO garage_util::background::worker: Worker multipart_upload queue (TID 33) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688710Z INFO garage_util::background::worker: Worker bucket_v2 queue (TID 13) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688716Z INFO garage_util::background::worker: Worker multipart_upload Merkle (TID 30) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688721Z INFO garage_util::background::worker: Worker bucket_object_counter Merkle (TID 26) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688725Z INFO garage_util::background::worker: Worker k2v_index_counter_v2 GC (TID 53) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688730Z INFO garage_util::background::worker: Worker bucket_v2 GC (TID 12) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688734Z INFO garage_util::background::worker: Worker block_ref GC (TID 44) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688737Z INFO garage::server: Admin server exited without error. Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688740Z INFO garage_util::background::worker: Worker key sync (TID 19) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688756Z INFO garage_util::background::worker: Worker bucket_alias queue (TID 17) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688751Z INFO garage::server: Deregistering RPC handlers for shutdown... Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688762Z INFO garage_util::background::worker: Worker Block resync worker #7 (TID 7) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688767Z INFO garage_util::background::worker: Worker k2v_index_counter_v2 Merkle (TID 51) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688773Z INFO garage_util::background::worker: Worker k2v_index_counter_v2 sync (TID 52) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688777Z INFO garage_util::background::worker: Worker block_ref queue (TID 45) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688781Z INFO garage_util::background::worker: Worker Block resync worker #4 (TID 4) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688785Z INFO garage_util::background::worker: Worker bucket_mpu_counter Merkle (TID 34) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688790Z INFO garage_util::background::worker: Worker k2v_item Merkle (TID 47) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688794Z INFO garage_util::background::worker: Worker bucket_v2 Merkle (TID 10) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688799Z INFO garage_util::background::worker: Worker bucket_object_counter GC (TID 28) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688802Z INFO garage_util::background::worker: Worker key GC (TID 20) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688808Z INFO garage_util::background::worker: Worker bucket_object_counter queue (TID 29) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688813Z INFO garage_util::background::worker: Worker block_ref sync (TID 43) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688818Z INFO garage_util::background::worker: Worker bucket_mpu_counter sync (TID 35) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688823Z INFO garage_util::background::worker: Worker object sync (TID 23) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688828Z INFO garage_util::background::worker: Worker block_ref Merkle (TID 42) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688833Z INFO garage_util::background::worker: Worker bucket_alias Merkle (TID 14) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688837Z INFO garage_util::background::worker: Worker bucket_mpu_counter GC (TID 36) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688841Z INFO garage_util::background::worker: Worker bucket_alias GC (TID 16) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688845Z INFO garage_util::background::worker: Worker bucket_alias sync (TID 15) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688865Z INFO garage_util::background::worker: Worker k2v_index_counter_v2 queue (TID 54) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688869Z INFO garage_util::background::worker: Worker k2v_item queue (TID 50) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688873Z INFO garage_util::background::worker: Worker Block resync worker #8 (TID 8) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688877Z INFO garage_util::background::worker: Worker Block resync worker #2 (TID 2) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688882Z INFO garage_util::background::worker: Worker version Merkle (TID 38) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688885Z INFO garage_util::background::worker: Worker k2v_item GC (TID 49) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688890Z INFO garage_util::background::worker: Worker multipart_upload sync (TID 31) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688895Z INFO garage_util::background::worker: Worker bucket_object_counter sync (TID 27) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688900Z INFO garage_util::background::worker: Worker bucket_v2 sync (TID 11) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688904Z INFO garage_util::background::worker: Worker key queue (TID 21) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688908Z INFO garage_util::background::worker: Worker Block resync worker #6 (TID 6) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688912Z INFO garage_util::background::worker: Worker Block resync worker #5 (TID 5) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688916Z INFO garage_util::background::worker: Worker version queue (TID 41) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688921Z INFO garage_util::background::worker: Worker version sync (TID 39) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688924Z INFO garage_util::background::worker: Worker object queue (TID 25) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688928Z INFO garage_util::background::worker: Worker Block resync worker #3 (TID 3) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.688933Z INFO garage_util::background::worker: Worker version GC (TID 40) exited (last state: Idle) Dec 02 17:55:50 awan garage[23301]: 2023-12-03T01:55:50.690529Z INFO garage_util::background::worker: Worker Block scrub worker (TID 9) exited (last state: Throttled(0.14840554)) Dec 02 17:55:51 awan garage[23301]: 2023-12-03T01:55:51.294332Z INFO garage::server: Netapp exited Dec 02 17:55:58 awan garage[23301]: 2023-12-03T01:55:58.690778Z ERROR garage_util::background::worker: Some workers could not exit in time, we are cancelling some things in the middle Dec 02 17:55:58 awan garage[23301]: 2023-12-03T01:55:58.690831Z INFO garage::server: Cleaning up... Dec 02 17:57:20 awan systemd[1]: garage.service: State 'stop-sigterm' timed out. Killing. Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 23301 (garage) with signal SIGKILL. Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 15262 (garage) with signal SIGKILL. Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 15350 (tokio-runtime-w) with signal SIGKILL. Dec 02 17:57:20 awan systemd[1]: garage.service: Killing process 23385 (tokio-runtime-w) with signal SIGKILL. Dec 02 17:58:51 awan systemd[1]: garage.service: Processes still around after SIGKILL. Ignoring. Dec 02 18:00:21 awan systemd[1]: garage.service: State 'final-sigterm' timed out. Killing. Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 23301 (garage) with signal SIGKILL. Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 15262 (garage) with signal SIGKILL. Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 15350 (tokio-runtime-w) with signal SIGKILL. Dec 02 18:00:21 awan systemd[1]: garage.service: Killing process 23385 (tokio-runtime-w) with signal SIGKILL. Dec 02 18:01:51 awan systemd[1]: garage.service: Processes still around after final SIGKILL. Entering failed mode. Dec 02 18:01:51 awan systemd[1]: garage.service: Failed with result 'timeout'. Dec 02 18:01:51 awan systemd[1]: garage.service: Unit process 15262 (garage) remains running after unit stopped. Dec 02 18:01:51 awan systemd[1]: garage.service: Unit process 23301 (garage) remains running after unit stopped. Dec 02 18:01:51 awan systemd[1]: Stopped Garage Object Storage (S3 compatible). Dec 02 18:01:51 awan systemd[1]: garage.service: Consumed 9min 30.893s CPU time, read 390.4M from disk, written 88.0K to disk, received 43.4K IP traffic, sent 759.8K IP traffic. ``` </details>
Author
Contributor

Possibly more hints! After the systemd service was terminated with failed status, I can see that the process is defunct.

❯ ps aux | grep garage
64665      34776  100  0.0      0     0 ?        Zsl  20:29  46:25 [garage] <defunct>

However, lsof is still showing the files are open.

❯ sudo lsof | grep 34776
garage    34776 34822 tokio-run            64665  cwd       DIR               8,34      4096          2 /
garage    34776 34822 tokio-run            64665  rtd       DIR               8,34      4096          2 /
garage    34776 34822 tokio-run            64665  txt       REG               8,34  56418648    9447649 /nix/store/3jiz272674hmkcacsql1hx90z0y2z0hs-garage-0.8.4/bin/garage
garage    34776 34822 tokio-run            64665  mem       REG               8,34              2228740 /var/lib/private/garage/config/db.lmdb/data.mdb (stat: No such file or directory)
garage    34776 34822 tokio-run            64665  mem-w     REG               8,34              2228738 /var/lib/private/garage/config/db.lmdb/lock.mdb (stat: No such file or directory)
garage    34776 34822 tokio-run            64665  mem       REG               8,34              6471306 /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libm.so.6 (stat: No such file or directory)
garage    34776 34822 tokio-run            64665  mem       REG               8,34              6339899 /nix/store/7vq9zymw7zk2i0k1i4nfkl6hmlx3gxji-gcc-12.3.0-libgcc/lib/libgcc_s.so.1 (stat: No such file or directory)
garage    34776 34822 tokio-run            64665  mem       REG               8,34              6471299 /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libc.so.6 (stat: No such file or directory)
garage    34776 34822 tokio-run            64665  mem       REG               8,34              6471293 /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/ld-linux-x86-64.so.2 (stat: No such file or directory)
garage    34776 34822 tokio-run            64665    0r      CHR                1,3       0t0          4 /dev/null
garage    34776 34822 tokio-run            64665    1u     unix 0xffff8881b9c08000       0t0     123834 type=STREAM (CONNECTED)
garage    34776 34822 tokio-run            64665    2u     unix 0xffff8881b9c08000       0t0     123834 type=STREAM (CONNECTED)
garage    34776 34822 tokio-run            64665    3u  a_inode               0,14         0       4105 [eventpoll:4,8]
garage    34776 34822 tokio-run            64665    4u  a_inode               0,14         0       4105 [eventfd:2]
garage    34776 34822 tokio-run            64665    5u  a_inode               0,14         0       4105 [eventpoll:4,8]
garage    34776 34822 tokio-run            64665    6u     unix 0xffff8881c6f68cc0       0t0     133238 type=STREAM (CONNECTED)
garage    34776 34822 tokio-run            64665    7u     unix 0xffff8881c6f68880       0t0     133239 type=STREAM (CONNECTED)
garage    34776 34822 tokio-run            64665    8u     unix 0xffff8881c6f68cc0       0t0     133238 type=STREAM (CONNECTED)
garage    34776 34822 tokio-run            64665    9uw     REG               8,34     32128    2228738 /var/lib/private/garage/config/db.lmdb/lock.mdb
lsof: no pwd entry for UID 64665
garage    34776 34822 tokio-run            64665   10u      REG               8,34 324001792    2228740 /var/lib/private/garage/config/db.lmdb/data.mdb
garage    34776 34822 tokio-run            64665   11w      REG               8,34 324001792    2228740 /var/lib/private/garage/config/db.lmdb/data.mdb
Possibly more hints! After the systemd service was terminated with `failed` status, I can see that the process is defunct. ```shellsession ❯ ps aux | grep garage 64665 34776 100 0.0 0 0 ? Zsl 20:29 46:25 [garage] <defunct> ``` However, `lsof` is still showing the files are open. ```shellsession ❯ sudo lsof | grep 34776 garage 34776 34822 tokio-run 64665 cwd DIR 8,34 4096 2 / garage 34776 34822 tokio-run 64665 rtd DIR 8,34 4096 2 / garage 34776 34822 tokio-run 64665 txt REG 8,34 56418648 9447649 /nix/store/3jiz272674hmkcacsql1hx90z0y2z0hs-garage-0.8.4/bin/garage garage 34776 34822 tokio-run 64665 mem REG 8,34 2228740 /var/lib/private/garage/config/db.lmdb/data.mdb (stat: No such file or directory) garage 34776 34822 tokio-run 64665 mem-w REG 8,34 2228738 /var/lib/private/garage/config/db.lmdb/lock.mdb (stat: No such file or directory) garage 34776 34822 tokio-run 64665 mem REG 8,34 6471306 /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libm.so.6 (stat: No such file or directory) garage 34776 34822 tokio-run 64665 mem REG 8,34 6339899 /nix/store/7vq9zymw7zk2i0k1i4nfkl6hmlx3gxji-gcc-12.3.0-libgcc/lib/libgcc_s.so.1 (stat: No such file or directory) garage 34776 34822 tokio-run 64665 mem REG 8,34 6471299 /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/libc.so.6 (stat: No such file or directory) garage 34776 34822 tokio-run 64665 mem REG 8,34 6471293 /nix/store/qn3ggz5sf3hkjs2c797xf7nan3amdxmp-glibc-2.38-27/lib/ld-linux-x86-64.so.2 (stat: No such file or directory) garage 34776 34822 tokio-run 64665 0r CHR 1,3 0t0 4 /dev/null garage 34776 34822 tokio-run 64665 1u unix 0xffff8881b9c08000 0t0 123834 type=STREAM (CONNECTED) garage 34776 34822 tokio-run 64665 2u unix 0xffff8881b9c08000 0t0 123834 type=STREAM (CONNECTED) garage 34776 34822 tokio-run 64665 3u a_inode 0,14 0 4105 [eventpoll:4,8] garage 34776 34822 tokio-run 64665 4u a_inode 0,14 0 4105 [eventfd:2] garage 34776 34822 tokio-run 64665 5u a_inode 0,14 0 4105 [eventpoll:4,8] garage 34776 34822 tokio-run 64665 6u unix 0xffff8881c6f68cc0 0t0 133238 type=STREAM (CONNECTED) garage 34776 34822 tokio-run 64665 7u unix 0xffff8881c6f68880 0t0 133239 type=STREAM (CONNECTED) garage 34776 34822 tokio-run 64665 8u unix 0xffff8881c6f68cc0 0t0 133238 type=STREAM (CONNECTED) garage 34776 34822 tokio-run 64665 9uw REG 8,34 32128 2228738 /var/lib/private/garage/config/db.lmdb/lock.mdb lsof: no pwd entry for UID 64665 garage 34776 34822 tokio-run 64665 10u REG 8,34 324001792 2228740 /var/lib/private/garage/config/db.lmdb/data.mdb garage 34776 34822 tokio-run 64665 11w REG 8,34 324001792 2228740 /var/lib/private/garage/config/db.lmdb/data.mdb ```
Owner

logs on shutdown indicate that process tree did not quit completely

To know which tasks did not complete successfully, we would need to look at the output of garage worker list and find the ones that were not mentionned in the logs when exiting. It's a bit tedious but can be done

one of my CPU threads is stuck at 100%, consumed by garage

Does the 100% CPU start when Garage is started or only when you initiate the shutdown?

I think it woud be nice to be able to debug the process using gdb during the time where it is at 100% cpu, so that we can obtain a backtrace of the thread that keeps doing things. It could be that some logic in Garage is broken and it is just running in a loop, or it could be some issue related to some dependency like LMDB. I can't really tell for now.

> logs on shutdown indicate that process tree did not quit completely To know which tasks did not complete successfully, we would need to look at the output of `garage worker list` and find the ones that were not mentionned in the logs when exiting. It's a bit tedious but can be done > one of my CPU threads is stuck at 100%, consumed by garage Does the 100% CPU start when Garage is started or only when you initiate the shutdown? I think it woud be nice to be able to debug the process using gdb during the time where it is at 100% cpu, so that we can obtain a backtrace of the thread that keeps doing things. It could be that some logic in Garage is broken and it is just running in a loop, or it could be some issue related to some dependency like LMDB. I can't really tell for now.
Author
Contributor

To know which tasks did not complete successfully, we would need to look at the output of garage worker list and find the ones that were not mentionned in the logs when exiting. It's a bit tedious but can be done

Will try it next time I get the chance, thanks!

Does the 100% CPU start when Garage is started or only when you initiate the shutdown?

On start. Essentially it was impossible for me to shut down the computer after starting garage without manually using the physical power button. Tried running the server with and without systemd, both ends up with the same 100% CPU.

I think it woud be nice to be able to debug the process using gdb during the time where it is at 100% cpu, so that we can obtain a backtrace of the thread that keeps doing things. It could be that some logic in Garage is broken and it is just running in a loop, or it could be some issue related to some dependency like LMDB. I can't really tell for now.

Makes sense. I had uninstalled garage for the time being but will try to come back with more details next time. Thanks!

> To know which tasks did not complete successfully, we would need to look at the output of garage worker list and find the ones that were not mentionned in the logs when exiting. It's a bit tedious but can be done Will try it next time I get the chance, thanks! > Does the 100% CPU start when Garage is started or only when you initiate the shutdown? On start. Essentially it was impossible for me to shut down the computer after starting garage without manually using the physical power button. Tried running the server with and without systemd, both ends up with the same 100% CPU. > I think it woud be nice to be able to debug the process using gdb during the time where it is at 100% cpu, so that we can obtain a backtrace of the thread that keeps doing things. It could be that some logic in Garage is broken and it is just running in a loop, or it could be some issue related to some dependency like LMDB. I can't really tell for now. Makes sense. I had uninstalled garage for the time being but will try to come back with more details next time. Thanks!
lx closed this issue 2024-02-16 10:11:31 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#676
No description provided.