seafile_recovery/README.md

161 lines
9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# seafile_recovery
**Quick description:** `seafile_recovery` is a low-level tool that parses Seafile's on-disk file storage.
Compared to other tools, it works without the associated sqlite or MySQL database.
**Some use cases:** I developped this tool because I lost my database and wanted to get back my files.
It can also help you to diagnose problems with Seafile's on-disk file storage and maybe repair them.
Finally, it can help you better understand how Seafile is working internally and gather some statistics about your Seafile's repositories health.
**Features and limitation:** The tool can parse all commits of a repository, currently the `head` subcommand selects the "last" one according to the commit graph and an heuristic based on time. Each commit contains a `RootId`. All files and folders have an Id in Seafile, the `RootId` is simply the Id of the root folder of the repository at the time of the commit. You can inspect these Ids with `ls`. You can copy a file or a folder hierarchy on your disk with `cp`. Finally the `s3` subcommand directly transfer the file or folder hierarchy to a S3-compatible storage. Currently, the tool does not work with encrypted repositories. Advanced Seafile features are not tested. Finally, the tool has not been extensively tested and may crash when encountering some unusual edge cases.
**Disclaimer:** This tool is community made and thus not affiliated to Seafile Ltd., Seafile Gmbh. or any company.
The development of this tool has been done for my own needs, I can not be held responsible for any issue or damage it can cause.
Use it carefully or none at all if you are not sure of what you are doing, data are often more precious than we imagine.
Always shutdown your Seafile daemons before using it (both Seafile and Seahub).
Create a backup before running any command and double check all your operations.
## Installation
As a pre-requesite, you need a recent version of [Go](golang.org/).
```
go get git.deuxfleurs.fr/quentin/seafile_recovery
export PATH="$HOME/go/bin:$PATH"
seafile_recovery --help
```
## Tutorial
Let's suppose you start by knowing nothing about your storage folder and its repositories,
start by picking one repository ID in the `storage/commits` folder and run the `head` subcommand. For our example,
we will use `0011d396-4890-463a-8266-bcbd978d8d1c`.
```
$ seafile_recovery head 0011d396-4890-463a-8266-bcbd978d8d1c
2021/04/28 15:10:34 Repo contains 6 commits
2021/04/28 15:10:34 Repo has 1 sources
2021/04/28 15:10:34 Repo has 1 sinks
2021/04/28 15:10:34 Proposing following HEAD:
RootId: 5911dd2d363f591e43df4e80591d0a54975f2aaf
CreatorName: quentin@example.com
Creator: 0000000000000000000000000000000000000000
Description: Added "telecom-reclaimed-web-single-page.pdf".
Ctime: 2021-04-26 12:22:59 +0200 CEST
RepoName: Ma bibliothèque
RepoDesc: Ma bibliothèque
```
We know learnt some information about the repository, especially its name ("Ma bibliothèque"), who did the last change ("quentin@example.com") and the RootId ("5911dd2d363f591e43df4e80591d0a54975f2aaf").
We can now explore its last file hierarchy thanks to the RootId (we can only copy a part of the Id to keep the command more readable):
```
$ seafile_recovery ls 0011d396-4890-463a-8266-bcbd978d8d1c --dir=5911dd2
2021/04/28 15:15:40 5911dd /
2021/04/28 15:15:40 b88ab9 /seafile-tutorial.doc
2021/04/28 15:15:40 d24616 /Capture décran de 2021-04-11 23-07-31.png
2021/04/28 15:15:40 f123de /My Folder/
2021/04/28 15:15:40 15be4d /My Folder/telecom-reclaimed-web-single-page.pdf
2021/04/28 15:15:40 380a0e /My Folder/Capture décran vidéo de 19-12-2020 10:30:15.webm
2021/04/28 15:15:40 Total size: 25.6M
```
Now, let's suppose I want to extract the folder "My Folder" and its content and put it in a folder named `out`:
```
$ seafile_recovery cp 0011d396-4890-463a-8266-bcbd978d8d1c --dir=f123de ./out
2021/04/28 15:17:28 f123de /
2021/04/28 15:17:28 15be4d /telecom-reclaimed-web-single-page.pdf
2021/04/28 15:17:28 380a0e /Capture décran vidéo de 19-12-2020 10:30:15.webm
$ ls out/
'Capture décran vidéo de 19-12-2020 10:30:15.webm' telecom-reclaimed-web-single-page.pdf
```
Finally, if I prefer to upload this content directly on a S3 bucket, you can do:
```
$ seafile_recovery cp 0011d396-4890-463a-8266-bcbd978d8d1c --dir=f123de s3://ACCESS_KEY:SECRET_KEY@ENDPOINT/REGION/BUCKET[/PREFIX]
2021/04/28 15:17:28 f123de /
2021/04/28 15:17:28 15be4d /telecom-reclaimed-web-single-page.pdf
2021/04/28 15:17:28 380a0e /Capture décran vidéo de 19-12-2020 10:30:15.webm
```
**Be careful !** This tool is not intended to change your seafile backend from local filesystem to the S3 backend. Migrating to the S3 backend implies to keep Seafile's objects which is a totally different job. Appropriate scripts are available from Seafile's official distribution.
## Usage
```
Seafile Recovery.
Usage:
seafile_recovery [--storage=<sto>] head <repoid>
seafile_recovery [--storage=<sto>] ls <repoid> (--dir=<dirid> | --file=<fileid>)
seafile_recovery [--storage=<sto>] cp <repoid> (--dir=<dirid> | --file=<fileid>) <dest>
seafile_recovery [--storage=<sto>] s3 <repoid> (--dir=<dirid> | --file=<pathid>) <dest>
seafile_recovery s3del <dest>
seafile_recovery (-h | --help)
Options:
-h --help Show this screen
--storage=<sto> Set Seafile storage path [default: ./storage]
--dir=<dirid> Seafile Directory ID, can be obtained from commits as RootID
--file=<fileid> Seafile File ID, can be obtained through ls
```
## Seafile on-disk storage
Seafile sees your filesystem as an entity to store objects having IDs.
So, all files in Seafile's storage follow the following pattern:
```
.../storage/{commits,fs,blocks}/$repo_id/$obj_id[:2]/$obj_id[2:]
```
The following schema explains how these objects are linked between them and how to read them:
```
storage/commits/(repoid) storage/fs/(repoid) storage/blocks/(repoid)
(plain text json) (json + zlib) (chunk of raw data)
Dir (1) ┌──────────┐
HEAD ┌──────────┐ root_id ┌──────────┐ ┌─────►│5b/4c09c..│
(sink)│4f/2fcf9..├───────────────────►│98/ff6e3..│ │ └──────────┘
└─┬─────┬──┘ └─┬──────┬─┘ │
parent │ │ 2nd parent │ │DirEnt │ (2) ┌──────────┐
▼ ▼ │ │ ├─────►│eb/a557a..│
┌──────────┐ ┌──────────┐ Dir ▼ ▼ File │ └──────────┘
│21/22f45..├──┤a5/c7325..├───? ┌──────────┐ ┌──────────┐ │
└────────┬─┘ └─┬────────┘ │9f/31be6..│ │3b/2e671..├──┤ (3) ┌──────────┐
parent │ │ parent └─────┬──┬─┘ └──────────┘ └─────►│42/1aac0..│
▼ ▼ │ │DirEnt └──────────┘
┌──────────┐ │ └──────┐
│5b/2f24f..├──────────? File ▼ ▼ File (1) ┌──────────┐
└────┬─────┘ ┌──────────┐ ┌──────────┐ ┌──────►│0b/5c780..│
│ parent │4a/54b55..│ │ba/557ae..├─┘ └──────────┘
▼ └───────┬──┘ └──────────┘
┌──────────┐ │ (1) ┌──────────┐
Initial│69/ca6b5..├──────────? └─────────────────────────►│67/515ea..│
└────┬─────┘ (? = not shown) └──────────┘
X no parent
```
----
## Dev notes
Should look how Seafile handles ID collision, it might be one here in a repo with `44592` commits:
```
$ ls -lah
62684fe2260d67b6b5d2de909c3816feb21c39 bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e ffc4e7f4273c8e4cc57124ccb6d65467c3b6a3
641064a61de537a696f2172e90be9c8ac4ae04 bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e.8WWPVZ
$ ls -lah bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e.8WWPVZ
-rw------- 1 1000 1000 0 Jan 12 2019 bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e.8WWPVZ
% ls -lah bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e
-rw------- 1 1000 1000 629 Jan 12 2019 bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e
```