On LMDB compaction #1006

Open
opened 2025-04-12 12:40:04 +00:00 by quentin · 0 comments
Owner

Problem

On one node, we have a 145GB file for ~4GB of useful data.

Here is an overview from lmdbnav (a tool by PowerDNS dev):

Workaround

mdb_copy with the -c flag can do this compacting work, and it indeed reduce the database size:

[nix-shell:/mnt/ssd/garage/meta/snapshots]# mdb_copy -c \
  ./2025-04-12T09\:29\:03Z \
  ./2025-04-12T09\:29\:03Z+COMPACTED/
[nix-shell:/mnt/ssd/garage/meta/snapshots]# ls -alh ./2025-04-12T09\:29\:03Z
total 146G
drwxr-xr-x 2 root root 4.0K Apr 12 14:17 .
drwxr-xr-x 5 root root 4.0K Apr 12 14:36 ..
-rw-r--r-- 1 root root 146G Apr 12 13:48 data.mdb
-rw-r--r-- 1 root root 8.0K Apr 12 14:44 lock.mdb

[nix-shell:/mnt/ssd/garage/meta/snapshots]# ls -alh ./2025-04-12T09\:29\:03Z+COMPACTED/
total 3.9G
drwxr-xr-x 2 root root 4.0K Apr 12 14:37 .
drwxr-xr-x 5 root root 4.0K Apr 12 14:36 ..
-rw-r--r-- 1 root root 3.9G Apr 12 14:44 data.mdb

We went from 146GB to 3.9GB (37x smaller)

Compact snapshots

Snapshots are amplifying wasted data here. I think one easy step would be to activate compaction when performing a snapshot.

This is the only line to edit: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/db/lmdb_adapter.rs#L112

Compact main database

Another would be to trigger this task upon Garage shutdown, as compaction can be done only when Garage is offline, we could have a dedicated command line to do that, which would require more operation work, or doing that upon shutdown or startup. On startup, it's an issue, as we probably want to start Garage as fast as possible, especially when a crash occured. However, clean shutdown are always done when you can afford to loose a node, so it's the perfect time to do some maintenance work.

## Problem On one node, we have a 145GB file for ~4GB of useful data. Here is an overview from `lmdbnav` (a tool by PowerDNS dev): ![](https://git.deuxfleurs.fr/attachments/770d6347-e408-43a6-8ee4-ec03ea2233c3) ## Workaround [mdb_copy](http://www.lmdb.tech/doc/man1/mdb_copy_1.html) with the `-c` flag can do this compacting work, and it indeed reduce the database size: ``` [nix-shell:/mnt/ssd/garage/meta/snapshots]# mdb_copy -c \ ./2025-04-12T09\:29\:03Z \ ./2025-04-12T09\:29\:03Z+COMPACTED/ ``` ``` [nix-shell:/mnt/ssd/garage/meta/snapshots]# ls -alh ./2025-04-12T09\:29\:03Z total 146G drwxr-xr-x 2 root root 4.0K Apr 12 14:17 . drwxr-xr-x 5 root root 4.0K Apr 12 14:36 .. -rw-r--r-- 1 root root 146G Apr 12 13:48 data.mdb -rw-r--r-- 1 root root 8.0K Apr 12 14:44 lock.mdb [nix-shell:/mnt/ssd/garage/meta/snapshots]# ls -alh ./2025-04-12T09\:29\:03Z+COMPACTED/ total 3.9G drwxr-xr-x 2 root root 4.0K Apr 12 14:37 . drwxr-xr-x 5 root root 4.0K Apr 12 14:36 .. -rw-r--r-- 1 root root 3.9G Apr 12 14:44 data.mdb ``` We went from **146GB** to **3.9GB** (**37x** smaller) ## Compact snapshots Snapshots are amplifying wasted data here. I think one easy step would be to activate compaction when performing a snapshot. This is the only line to edit: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/db/lmdb_adapter.rs#L112 ## Compact main database Another would be to trigger this task upon Garage shutdown, as compaction can be done only when Garage is offline, we could have a dedicated command line to do that, which would require more operation work, or doing that upon shutdown or startup. On startup, it's an issue, as we probably want to start Garage as fast as possible, especially when a crash occured. However, clean shutdown are always done when you can afford to loose a node, so it's the perfect time to do some maintenance work.
quentin changed title from Could we create compacted snapshots? to On LMDB compaction 2025-04-12 13:19:56 +00:00
quentin added the
kind
performance
action
for-external-contributors
scope
metadata
labels 2025-04-12 13:20:39 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#1006
No description provided.