merkle tree panic #797
Labels
No Label
AdminAPI
Bug
Check AWS
CI
Correctness
Critical
Documentation
Ideas
Improvement
Low priority
Newcomer
Performance
S3 Compatibility
Testing
Usability
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#797
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Garage started giving this error with seemingly no warning or reason, and there's no way to start the server/access any data.
@tezlm can you specify the hardware you use? I see in the stack trace that you are using the SQLite backend for metadata. Is the storage an SD card? Was there any interruption in power? What's the underlying filesystem for metadata?
I'm using an old laptop, the data (both metadata/sqlite and blobs) is on a nvme ssd with ext4. As far as I know, the ssd is healthy. I do remember rebooting before the database failure (it failed on startup), but that seems like it would've been a clean shutdown.
Is there any other nodes in that cluster or are you running a single node?
This is with a single node
Unfortunately there is not much to be done at this point, this does look like some corruption in your database file that Garage is not able to recover from. If it's only in the Merkle tree, we could probably just rebuild that tree from scratch as the tree is basically just an index structure, but there is currently no code in Garage to do this.
Is there any way to manually recover the data?
@tezlm can you be more specific on the kind of data that was stored on this node? Do you have any backups of the metadata? @lx coud the sled migration code be used as a base to rebuild the index tree?
No, unfortunately, it does not rebuild the Merkle tree, it copies it as-is.
You could try clearing the Merkle tree tables in the sqlite db file. This will not put your garage instance in a proper state for continued use, but it should allow you at least to start Garage to copy your data out of it.
First, make a backup of your db.sqlite file.
Then, launch the sqlite command line with
sqlite3 db.sqlite
.Then, execute the following commands in your sqlite session:
Then, try starting Garage.
@maximilien Media and tarballs, most of which are 1-100 MiB. I don't have any backup of the metadata, but I definitely will set one up now!
@lx That works and gave these logs, not sure if it's useful or not:
@tezlm These errors might indicate more corruption in your metadata file, but the most important is that you're able to retrieve all your files from Garage. At this point, you should copy everything out of garage on your local disk and rebuild your garage cluster from scratch.
I'm closing this for now, as I think it's an error on my end rather than with garage.