High metadata disk usage #580

Closed
opened 2023-06-06 01:51:33 +00:00 by rudexi · 3 comments

Context: I have a 3 node setup of Garage in an on-premise Kubernetes.

> kubectl exec -it quickwit-garage-0 -c garage -- ./garage status
==== HEALTHY NODES ====
ID                Hostname           Address                        Tags  Zone      Capacity  DataAvail
<redacted-id>  quickwit-garage-1     [::ffff:<redacted-ipv4>]:3901  []    quickwit  1         512.7 GB (97.2%)
<redacted-id>  quickwit-garage-0     <redacted-ipv4>:3901           []    quickwit  1         512.9 GB (97.3%)
<redacted-id>  quickwit-garage-2     <redacted-ipv4>:3901           []    quickwit  1         512.7 GB (97.2%)

I have been successfully using Garage so far for smaller oss applications (namely, Netbox and Harbor). This instance of Garage was used for Quickwit (an open-source log indexer predating Elasticsearch).

I am using the official helm chart for deployment (0.4.0). Garage version is v0.8.2:

> kubectl exec -it quickwit-garage-0 -c garage -- ./garage -V
garage v0.8.2 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs]

After running it successfully for 1.5 month, I got the following error from garage (reported by quickwit during some queries):

<Message>Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: [&quot;DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \\&quot;No space left on device\\&quot; }&quot;, &quot;Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \\&quot;No space left on device\\&quot; }&quot;]</Message>

Trying to list the objects (via Minio's mcli) result in the same error:

$ ./mcli ls quickwit/quickwit
mcli: <ERROR> Unable to list folder. Internal error: Could not reach quorum of 2. 0 of 3 request succeeded, others returned errors: ["DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space
 left on device\" }", "Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }"]

Looking at garage's logs, I got a tad bit more context:

2023-06-06T01:24:51.821862Z  INFO garage_api::generic_server: [::ffff:<redacted-ipv4>]:42002 PUT /quickwit/indexes/otel-logs-v1/<redacted-hash>.split
2023-06-06T01:24:51.829556Z  WARN garage_api::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }", "Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }"]
2023-06-06T01:24:51.830203Z  WARN garage_api::s3::put: Cannot cleanup after aborted PutObject: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }", "Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }"]

Also noteworthy (though it could be a direct consequence):

2023-06-06T01:24:43.723653Z  INFO garage_api::generic_server: [::ffff:<redacted-ipv4>]:43790 GET /health
2023-06-06T01:24:49.355407Z ERROR garage_util::background::worker: Error in worker bucket_object_counter queue (TID 25): Too many errors
2023-06-06T01:24:50.115484Z ERROR garage_util::background::worker: Error in worker block_ref queue (TID 33): Too many errors
2023-06-06T01:24:50.377998Z ERROR garage_util::background::worker: Error in worker version queue (TID 29): Too many errors

Here are some Quickwit index statistics:

Created at: 2023/04/28 04:04
URI: s3://quickwit/indexes/otel-logs-v1
Size of published splits: 19,560.50 MB
Number of published documents: 199,994,156
Number of published splits: 61
Number of staged splits: 423
Number of splits marked for deletion: 0

Garage bucket info:

 > kubectl exec -it quickwit-garage-0 -- ./garage bucket info quickwit
Defaulted container "garage" out of: garage, garage-init (init)
Bucket: <redacted-bucket-hash>

Size: 19.1 GiB (20.6 GB)
Objects: 196
Unfinished multipart uploads: 1

Website access: false

Global aliases:
  quickwit

Key-specific aliases:

Authorized keys:
  RWO  <redacted-key-name>  quickwit

I mounted the problematic volume that was full (meta-quickwit-garage-<i>) to inspect it, and the db was the main contributor
to the storage usage. I tried inspecting it with lmdb/sqlite/sled tools, but could not figure out the actual format.

$ du -sh /mnt/db/db
920M    /mnt/db/db

All metadata volumes were close to being full:

# Manually formatted output from `kubectl df-pv`
meta-quickwit-garage-0 957Mi/973Mi
meta-quickwit-garage-1 920Mi/973Mi
meta-quickwit-garage-2 957Mi/973Mi

Here is the metadata configuration (if it matters):

$ cat /mnt/db/conf; echo
segment_size: 524288
use_compression: false
version: 0.34
vQ

The number of objects is low, while quickwit report having inserted a lot of objects.
It seems quickwit have a high creation/deletion rate of object, since it constantly write data (logs), and concatenate it regularly (a bit like a LSM tree).
The error is straightforward: the database used for metadata grew large enough to exceed its dedicated volume.

This does look like it's linked to the problem described here:
https://garagehq.deuxfleurs.fr/documentation/design/internals/#1-garbage-collection-of-table-entries-in-meta-directory

Is this behavior expected/intended under this situation, or is there something wrong with my setup?
Is there a workaround for limiting this database growth? Like manually triggering a job to delete deleted document paths.

Context: I have a 3 node setup of Garage in an on-premise Kubernetes. ``` > kubectl exec -it quickwit-garage-0 -c garage -- ./garage status ==== HEALTHY NODES ==== ID Hostname Address Tags Zone Capacity DataAvail <redacted-id> quickwit-garage-1 [::ffff:<redacted-ipv4>]:3901 [] quickwit 1 512.7 GB (97.2%) <redacted-id> quickwit-garage-0 <redacted-ipv4>:3901 [] quickwit 1 512.9 GB (97.3%) <redacted-id> quickwit-garage-2 <redacted-ipv4>:3901 [] quickwit 1 512.7 GB (97.2%) ``` I have been successfully using Garage so far for smaller oss applications (namely, Netbox and Harbor). This instance of Garage was used for Quickwit (an open-source log indexer predating Elasticsearch). I am using the official helm chart for deployment (0.4.0). Garage version is v0.8.2: ``` > kubectl exec -it quickwit-garage-0 -c garage -- ./garage -V garage v0.8.2 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs] ``` After running it successfully for 1.5 month, I got the following error from garage (reported by quickwit during some queries): ``` <Message>Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: [&quot;DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \\&quot;No space left on device\\&quot; }&quot;, &quot;Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \\&quot;No space left on device\\&quot; }&quot;]</Message> ``` Trying to list the objects (via Minio's `mcli`) result in the same error: ``` $ ./mcli ls quickwit/quickwit mcli: <ERROR> Unable to list folder. Internal error: Could not reach quorum of 2. 0 of 3 request succeeded, others returned errors: ["DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }", "Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }"] ``` Looking at garage's logs, I got a tad bit more context: ``` 2023-06-06T01:24:51.821862Z INFO garage_api::generic_server: [::ffff:<redacted-ipv4>]:42002 PUT /quickwit/indexes/otel-logs-v1/<redacted-hash>.split 2023-06-06T01:24:51.829556Z WARN garage_api::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }", "Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }"] 2023-06-06T01:24:51.830203Z WARN garage_api::s3::put: Cannot cleanup after aborted PutObject: Could not reach quorum of 2. 1 of 3 request succeeded, others returned errors: ["Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }", "Remote error: DB error: Sled: IO error: Os { code: 28, kind: StorageFull, message: \"No space left on device\" }"] ``` Also noteworthy (though it could be a direct consequence): ``` 2023-06-06T01:24:43.723653Z INFO garage_api::generic_server: [::ffff:<redacted-ipv4>]:43790 GET /health 2023-06-06T01:24:49.355407Z ERROR garage_util::background::worker: Error in worker bucket_object_counter queue (TID 25): Too many errors 2023-06-06T01:24:50.115484Z ERROR garage_util::background::worker: Error in worker block_ref queue (TID 33): Too many errors 2023-06-06T01:24:50.377998Z ERROR garage_util::background::worker: Error in worker version queue (TID 29): Too many errors ``` Here are some Quickwit index statistics: ``` Created at: 2023/04/28 04:04 URI: s3://quickwit/indexes/otel-logs-v1 Size of published splits: 19,560.50 MB Number of published documents: 199,994,156 Number of published splits: 61 Number of staged splits: 423 Number of splits marked for deletion: 0 ``` Garage bucket info: ``` > kubectl exec -it quickwit-garage-0 -- ./garage bucket info quickwit Defaulted container "garage" out of: garage, garage-init (init) Bucket: <redacted-bucket-hash> Size: 19.1 GiB (20.6 GB) Objects: 196 Unfinished multipart uploads: 1 Website access: false Global aliases: quickwit Key-specific aliases: Authorized keys: RWO <redacted-key-name> quickwit ``` I mounted the problematic volume that was full (`meta-quickwit-garage-<i>`) to inspect it, and the `db` was the main contributor to the storage usage. I tried inspecting it with lmdb/sqlite/sled tools, but could not figure out the actual format. ``` $ du -sh /mnt/db/db 920M /mnt/db/db ``` All metadata volumes were close to being full: ``` # Manually formatted output from `kubectl df-pv` meta-quickwit-garage-0 957Mi/973Mi meta-quickwit-garage-1 920Mi/973Mi meta-quickwit-garage-2 957Mi/973Mi ``` Here is the metadata configuration (if it matters): ``` $ cat /mnt/db/conf; echo segment_size: 524288 use_compression: false version: 0.34 vQ ``` The number of objects is low, while quickwit report having inserted a lot of objects. It seems quickwit have a high creation/deletion rate of object, since it constantly write data (logs), and concatenate it regularly (a bit like a LSM tree). The error is straightforward: the database used for metadata grew large enough to exceed its dedicated volume. This does look like it's linked to the problem described here: https://garagehq.deuxfleurs.fr/documentation/design/internals/#1-garbage-collection-of-table-entries-in-meta-directory Is this behavior expected/intended under this situation, or is there something wrong with my setup? Is there a workaround for limiting this database growth? Like manually triggering a job to delete deleted document paths.
Author

Additionally, I collected metrics over the past month from the Prometheus interface, so feel free to ask for graphs if it helps.

Here is the graph for:

avg by(api_endpoint) (api_s3_request_counter{service="quickwit-garage-metrics"})

It looks like the number of PUT was predominant.

Additionally, I collected metrics over the past month from the Prometheus interface, so feel free to ask for graphs if it helps. Here is the graph for: ``` avg by(api_endpoint) (api_s3_request_counter{service="quickwit-garage-metrics"}) ``` It looks like the number of PUT was predominant.
Owner

Hi @rudexi, it seems you are using sled as your DB file has no extension.
Here the code in Garage that computes the DB path:

let db = match config.db_engine.as_str() {
	"sled" => {
		db_path.push("db");
        // ...
    }
   	"sqlite" | "sqlite3" | "rusqlite" => {
		db_path.push("db.sqlite");
        // ...
    }		
	"lmdb" | "heed" => {
		db_path.push("db.lmdb");
        //...
    }
    

https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/model/garage.rs#L88

sled is known for using lot of memory and taking lot of disk space, and we also kow it has garbage collection issues. We plan to switch to lmdb by default, and maybe even to completely drop sled in the future.

So a first debug step would be to convert your metadata from sled to lmdb. The steps are documented here: https://garagehq.deuxfleurs.fr/documentation/reference-manual/configuration/#db-engine-since-v0-8-0 - Do not forget to backup your important data before.

Could you try switching to LMDB and come back here to tell us if it solved your problem? :)

Hi @rudexi, it seems you are using `sled` as your DB file has no extension. Here the code in Garage that computes the DB path: ```rust let db = match config.db_engine.as_str() { "sled" => { db_path.push("db"); // ... } "sqlite" | "sqlite3" | "rusqlite" => { db_path.push("db.sqlite"); // ... } "lmdb" | "heed" => { db_path.push("db.lmdb"); //... } ``` https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/model/garage.rs#L88 sled is known for using lot of memory and taking lot of disk space, and we also kow it has garbage collection issues. We plan to switch to `lmdb` by default, and maybe even to completely drop `sled` in the future. So a first debug step would be to convert your metadata from sled to lmdb. The steps are documented here: https://garagehq.deuxfleurs.fr/documentation/reference-manual/configuration/#db-engine-since-v0-8-0 - Do not forget to backup your important data before. Could you try switching to LMDB and come back here to tell us if it solved your problem? :)
Owner

Closing for inactivity.

Closing for inactivity.
lx closed this issue 2023-07-11 09:34:23 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#580
No description provided.