seafile_recovery/README.md

9.0 KiB
Raw Blame History

seafile_recovery

Quick description: seafile_recovery is a low-level tool that parses Seafile's on-disk file storage. Compared to other tools, it works without the associated sqlite or MySQL database.

Some use cases: I developped this tool because I lost my database and wanted to get back my files. It can also help you to diagnose problems with Seafile's on-disk file storage and maybe repair them. Finally, it can help you better understand how Seafile is working internally and gather some statistics about your Seafile's repositories health.

Features and limitation: The tool can parse all commits of a repository, currently the head subcommand selects the "last" one according to the commit graph and an heuristic based on time. Each commit contains a RootId. All files and folders have an Id in Seafile, the RootId is simply the Id of the root folder of the repository at the time of the commit. You can inspect these Ids with ls. You can copy a file or a folder hierarchy on your disk with cp. Finally the s3 subcommand directly transfer the file or folder hierarchy to a S3-compatible storage. Currently, the tool does not work with encrypted repositories. Advanced Seafile features are not tested. Finally, the tool has not been extensively tested and may crash when encountering some unusual edge cases.

Disclaimer: This tool is community made and thus not affiliated to Seafile Ltd., Seafile Gmbh. or any company. The development of this tool has been done for my own needs, I can not be held responsible for any issue or damage it can cause. Use it carefully or none at all if you are not sure of what you are doing, data are often more precious than we imagine. Always shutdown your Seafile daemons before using it (both Seafile and Seahub). Create a backup before running any command and double check all your operations.

Installation

As a pre-requesite, you need a recent version of Go.

go get git.deuxfleurs.fr/quentin/seafile_recovery
export PATH="$HOME/go/bin:$PATH"
seafile_recovery --help

Tutorial

Let's suppose you start by knowing nothing about your storage folder and its repositories, start by picking one repository ID in the storage/commits folder and run the head subcommand. For our example, we will use 0011d396-4890-463a-8266-bcbd978d8d1c.

$ seafile_recovery head 0011d396-4890-463a-8266-bcbd978d8d1c
2021/04/28 15:10:34 Repo contains 6 commits
2021/04/28 15:10:34 Repo has 1 sources
2021/04/28 15:10:34 Repo has 1 sinks
2021/04/28 15:10:34 Proposing following HEAD:
RootId: 5911dd2d363f591e43df4e80591d0a54975f2aaf
CreatorName: quentin@example.com
Creator: 0000000000000000000000000000000000000000
Description: Added "telecom-reclaimed-web-single-page.pdf".
Ctime: 2021-04-26 12:22:59 +0200 CEST
RepoName: Ma bibliothèque
RepoDesc: Ma bibliothèque

We know learnt some information about the repository, especially its name ("Ma bibliothèque"), who did the last change ("quentin@example.com") and the RootId ("5911dd2d363f591e43df4e80591d0a54975f2aaf").

We can now explore its last file hierarchy thanks to the RootId (we can only copy a part of the Id to keep the command more readable):

$ seafile_recovery ls 0011d396-4890-463a-8266-bcbd978d8d1c --dir=5911dd2
2021/04/28 15:15:40 5911dd /
2021/04/28 15:15:40 b88ab9 /seafile-tutorial.doc
2021/04/28 15:15:40 d24616 /Capture décran de 2021-04-11 23-07-31.png
2021/04/28 15:15:40 f123de /My Folder/
2021/04/28 15:15:40 15be4d /My Folder/telecom-reclaimed-web-single-page.pdf
2021/04/28 15:15:40 380a0e /My Folder/Capture décran vidéo de 19-12-2020 10:30:15.webm
2021/04/28 15:15:40 Total size: 25.6M

Now, let's suppose I want to extract the folder "My Folder" and its content and put it in a folder named out:

$ seafile_recovery cp 0011d396-4890-463a-8266-bcbd978d8d1c --dir=f123de ./out
2021/04/28 15:17:28 f123de /
2021/04/28 15:17:28 15be4d /telecom-reclaimed-web-single-page.pdf
2021/04/28 15:17:28 380a0e /Capture décran vidéo de 19-12-2020 10:30:15.webm
$ ls out/
'Capture décran vidéo de 19-12-2020 10:30:15.webm'   telecom-reclaimed-web-single-page.pdf

Finally, if I prefer to upload this content directly on a S3 bucket, you can do:

$ seafile_recovery cp 0011d396-4890-463a-8266-bcbd978d8d1c --dir=f123de s3://ACCESS_KEY:SECRET_KEY@ENDPOINT/REGION/BUCKET[/PREFIX]
2021/04/28 15:17:28 f123de /
2021/04/28 15:17:28 15be4d /telecom-reclaimed-web-single-page.pdf
2021/04/28 15:17:28 380a0e /Capture décran vidéo de 19-12-2020 10:30:15.webm

Be careful ! This tool is not intended to change your seafile backend from local filesystem to the S3 backend. Migrating to the S3 backend implies to keep Seafile's objects which is a totally different job. Appropriate scripts are available from Seafile's official distribution.

Usage

Seafile Recovery.

Usage:
  seafile_recovery [--storage=<sto>] head <repoid>
  seafile_recovery [--storage=<sto>] ls <repoid> (--dir=<dirid> | --file=<fileid>)
  seafile_recovery [--storage=<sto>] cp <repoid> (--dir=<dirid> | --file=<fileid>) <dest>
  seafile_recovery [--storage=<sto>] s3 <repoid> (--dir=<dirid> | --file=<pathid>) <dest>
  seafile_recovery s3del <dest>
  seafile_recovery (-h | --help)

Options:
  -h --help        Show this screen
  --storage=<sto>  Set Seafile storage path [default: ./storage]
  --dir=<dirid>    Seafile Directory ID, can be obtained from commits as RootID
  --file=<fileid>  Seafile File ID, can be obtained through ls

Seafile on-disk storage

Seafile sees your filesystem as an entity to store objects having IDs. So, all files in Seafile's storage follow the following pattern:

.../storage/{commits,fs,blocks}/$repo_id/$obj_id[:2]/$obj_id[2:]

The following schema explains how these objects are linked between them and how to read them:

 storage/commits/(repoid)           storage/fs/(repoid)          storage/blocks/(repoid)
    (plain text json)                  (json + zlib)               (chunk of raw data)

                                        Dir                    (1) ┌──────────┐
  HEAD ┌──────────┐    root_id         ┌──────────┐         ┌─────►│5b/4c09c..│
 (sink)│4f/2fcf9..├───────────────────►│98/ff6e3..│         │      └──────────┘
       └─┬─────┬──┘                    └─┬──────┬─┘         │
  parent │     │ 2nd parent              │      │DirEnt     │  (2) ┌──────────┐
         ▼     ▼                         │      │           ├─────►│eb/a557a..│
┌──────────┐  ┌──────────┐       Dir     ▼      ▼    File   │      └──────────┘
│21/22f45..├──┤a5/c7325..├───?  ┌──────────┐  ┌──────────┐  │
└────────┬─┘  └─┬────────┘      │9f/31be6..│  │3b/2e671..├──┤  (3) ┌──────────┐
  parent │      │ parent        └─────┬──┬─┘  └──────────┘  └─────►│42/1aac0..│
         ▼      ▼                     │  │DirEnt                   └──────────┘
       ┌──────────┐                   │  └──────┐
       │5b/2f24f..├──────────?   File ▼         ▼     File     (1) ┌──────────┐
       └────┬─────┘             ┌──────────┐  ┌──────────┐ ┌──────►│0b/5c780..│
            │  parent           │4a/54b55..│  │ba/557ae..├─┘       └──────────┘
            ▼                   └───────┬──┘  └──────────┘
       ┌──────────┐                     │                      (1) ┌──────────┐
Initial│69/ca6b5..├──────────?          └─────────────────────────►│67/515ea..│
       └────┬─────┘ (? = not shown)                                └──────────┘
            │
            X no parent

Dev notes

Should look how Seafile handles ID collision, it might be one here in a repo with 44592 commits:

$ ls -lah
62684fe2260d67b6b5d2de909c3816feb21c39	bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e	       ffc4e7f4273c8e4cc57124ccb6d65467c3b6a3
641064a61de537a696f2172e90be9c8ac4ae04	bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e.8WWPVZ
$ ls -lah bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e.8WWPVZ
-rw------- 1 1000 1000 0 Jan 12  2019 bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e.8WWPVZ
% ls -lah bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e       
-rw------- 1 1000 1000 629 Jan 12  2019 bd8d7b2df788bf8bb6efc87ddb52c6f595ea7e