merkle tree panic #797
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind/experimental
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#797
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Garage started giving this error with seemingly no warning or reason, and there's no way to start the server/access any data.
@tezlm can you specify the hardware you use? I see in the stack trace that you are using the SQLite backend for metadata. Is the storage an SD card? Was there any interruption in power? What's the underlying filesystem for metadata?
I'm using an old laptop, the data (both metadata/sqlite and blobs) is on a nvme ssd with ext4. As far as I know, the ssd is healthy. I do remember rebooting before the database failure (it failed on startup), but that seems like it would've been a clean shutdown.
Is there any other nodes in that cluster or are you running a single node?
This is with a single node
Unfortunately there is not much to be done at this point, this does look like some corruption in your database file that Garage is not able to recover from. If it's only in the Merkle tree, we could probably just rebuild that tree from scratch as the tree is basically just an index structure, but there is currently no code in Garage to do this.
Is there any way to manually recover the data?
@tezlm can you be more specific on the kind of data that was stored on this node? Do you have any backups of the metadata? @lx coud the sled migration code be used as a base to rebuild the index tree?
No, unfortunately, it does not rebuild the Merkle tree, it copies it as-is.
You could try clearing the Merkle tree tables in the sqlite db file. This will not put your garage instance in a proper state for continued use, but it should allow you at least to start Garage to copy your data out of it.
First, make a backup of your db.sqlite file.
Then, launch the sqlite command line with
sqlite3 db.sqlite
.Then, execute the following commands in your sqlite session:
Then, try starting Garage.
@maximilien Media and tarballs, most of which are 1-100 MiB. I don't have any backup of the metadata, but I definitely will set one up now!
@lx That works and gave these logs, not sure if it's useful or not:
@tezlm These errors might indicate more corruption in your metadata file, but the most important is that you're able to retrieve all your files from Garage. At this point, you should copy everything out of garage on your local disk and rebuild your garage cluster from scratch.
I'm closing this for now, as I think it's an error on my end rather than with garage.