Not all workers quit on time — how can I troubleshoot? #676
Labels
No Label
AdminAPI
Bug
Check AWS
CI
Correctness
Critical
Documentation
Ideas
Improvement
Low priority
Newcomer
Performance
S3 Compatibility
Testing
Usability
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#676
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'm running garage on ZFS in NixOS via systemd and noticing several things:
I'm not sure when it started, but this initially happened on v0.8.2, despite it running for months without any problems. I thought it was a problem that would potentially be fixed with upgrade, so I upgraded to v0.9.0 and the problem persists.
Attached at the end of this post is the complete log, but I thought the interesting bit is here:
I'm not sure how to troubleshoot this since I can't tell which worker didn't manage to exit. How do I proceed on chasing this bug? Any pointers would be appreciated!
System information
Garage configuration
Garage logs
Possibly more hints! After the systemd service was terminated with
failed
status, I can see that the process is defunct.However,
lsof
is still showing the files are open.To know which tasks did not complete successfully, we would need to look at the output of
garage worker list
and find the ones that were not mentionned in the logs when exiting. It's a bit tedious but can be doneDoes the 100% CPU start when Garage is started or only when you initiate the shutdown?
I think it woud be nice to be able to debug the process using gdb during the time where it is at 100% cpu, so that we can obtain a backtrace of the thread that keeps doing things. It could be that some logic in Garage is broken and it is just running in a loop, or it could be some issue related to some dependency like LMDB. I can't really tell for now.
Will try it next time I get the chance, thanks!
On start. Essentially it was impossible for me to shut down the computer after starting garage without manually using the physical power button. Tried running the server with and without systemd, both ends up with the same 100% CPU.
Makes sense. I had uninstalled garage for the time being but will try to come back with more details next time. Thanks!