block_ref GC todo queue length doesn't decrease #453

New Issue

Open

opened 2022-12-21 23:17:06 +00:00 by scvalex · 3 comments

scvalex commented

2022-12-21 23:17:06 +00:00

In a new 2-node cluster (v0.8.0) with one bucket and one object, the GC todo queue length doesn't decrease for block_ref:

$ kubectl exec --stdin --tty -n garage garage-1 -- ./garage stats
...
Table stats for block_ref
  Merkle updater todo queue length: 0
  GC todo queue length: 286
...

The other numbers in garage stats are all 0.

Full garage.toml:

    metadata_dir = "/mnt/meta"
    data_dir = "/mnt/data"

    db_engine = "lmdb"
    replication_mode = "2"

    rpc_bind_addr = "[::]:3901"
    # rpc_secret will be populated by the init container from a k8s secret object
    rpc_secret = "__RPC_SECRET_REPLACE__"

    bootstrap_peers = []

    [kubernetes_discovery]
    namespace = "garage"
    service_name = "garage"
    skip_crd = false

    [s3_api]
    s3_region = "eu1"
    api_bind_addr = "[::]:3900"
    root_domain = ".eu1.abstractbinary.org"

    [s3_web]
    bind_addr = "[::]:3902"
    root_domain = ".eu1.abstractbinary.org"
    index = "index.html"

Cluster status:

==== HEALTHY NODES ====
ID                Hostname  Address           Tags  Zone  Capacity
56b086f971ae8aa3  garage-0  10.32.1.227:3901  []    eu1   1
61b2658578dd81ed  garage-1  10.32.2.120:3901  []    eu1   1

I've also attached the logs for the two garage nodes.

I should mention that this is a new cluster because I nuked the old one after it also got stuck with non-zero numbers in stats. That one had run out of space on /mnt/meta at some point and it didn't seem to recover after I gave it more space.

For this cluster, I'd love to figure out what's going on, but I don't know what the next debugging step is.

In a new 2-node cluster (v0.8.0) with one bucket and one object, the GC todo queue length doesn't decrease for block_ref: ``` $ kubectl exec --stdin --tty -n garage garage-1 -- ./garage stats ... Table stats for block_ref Merkle updater todo queue length: 0 GC todo queue length: 286 ... ``` The other numbers in `garage stats` are all 0. Full `garage.toml`: ``` metadata_dir = "/mnt/meta" data_dir = "/mnt/data" db_engine = "lmdb" replication_mode = "2" rpc_bind_addr = "[::]:3901" # rpc_secret will be populated by the init container from a k8s secret object rpc_secret = "__RPC_SECRET_REPLACE__" bootstrap_peers = [] [kubernetes_discovery] namespace = "garage" service_name = "garage" skip_crd = false [s3_api] s3_region = "eu1" api_bind_addr = "[::]:3900" root_domain = ".eu1.abstractbinary.org" [s3_web] bind_addr = "[::]:3902" root_domain = ".eu1.abstractbinary.org" index = "index.html" ``` Cluster status: ``` ==== HEALTHY NODES ==== ID Hostname Address Tags Zone Capacity 56b086f971ae8aa3 garage-0 10.32.1.227:3901 [] eu1 1 61b2658578dd81ed garage-1 10.32.2.120:3901 [] eu1 1 ``` I've also attached the logs for the two garage nodes. I should mention that this is a new cluster because I nuked the old one after it also got stuck with non-zero numbers in `stats`. That one had run out of space on `/mnt/meta` at some point and it didn't seem to recover after I gave it more space. For this cluster, I'd love to figure out what's going on, but I don't know what the next debugging step is.

garage-0.txt

62 KiB

garage-1.txt

62 KiB

scvalex commented

2022-12-30 10:33:54 +00:00

Author

The cutoff in the logs looks likes this:

2022-12-27T20:34:34.438089Z  INFO garage_block::resync: Resync block b2aee6e6ddf466a3: offloading and deleting
2022-12-27T20:34:34.438112Z  INFO garage_block::resync: Resync block b2aee6e6ddf466a3: offloading and deleting
2022-12-27T20:34:34.438956Z  INFO garage_block::resync: Deleting unneeded block b2aee6e6ddf466a3, offload finished (0 / 1)
2022-12-27T20:34:34.438978Z  INFO garage_block::resync: Deleting unneeded block b2aee6e6ddf466a3, offload finished (0 / 1)
2022-12-27T22:45:01.153542Z  INFO garage_table::gc: (block_ref) GC: 9 items successfully pushed, will try to delete.
2022-12-27T22:45:01.153560Z  INFO garage_table::gc: (block_ref) GC: 9 items successfully pushed, will try to delete.
2022-12-27T22:45:01.203067Z  INFO garage_table::gc: (block_ref) GC: 229 items successfully pushed, will try to delete.
2022-12-27T22:45:01.203077Z  INFO garage_table::gc: (block_ref) GC: 229 items successfully pushed, will try to delete.

Between 2022-12-25T13:39:11 and 2022-12-27T11:53:46 (for about two days), there were no (block_ref) GC lines at all. It was just many of those Resync block/Deleting uneeded block lines.

The second node's log looks the same.

That number stayed like that for at least a few hours, then increased with use (to 11464), then went to 0 a few days later. I don't think I've changed anything. In particular, I haven't restarted any nodes or made any changes to garage's configuration. The cutoff in the logs looks likes this: ``` 2022-12-27T20:34:34.438089Z INFO garage_block::resync: Resync block b2aee6e6ddf466a3: offloading and deleting 2022-12-27T20:34:34.438112Z INFO garage_block::resync: Resync block b2aee6e6ddf466a3: offloading and deleting 2022-12-27T20:34:34.438956Z INFO garage_block::resync: Deleting unneeded block b2aee6e6ddf466a3, offload finished (0 / 1) 2022-12-27T20:34:34.438978Z INFO garage_block::resync: Deleting unneeded block b2aee6e6ddf466a3, offload finished (0 / 1) 2022-12-27T22:45:01.153542Z INFO garage_table::gc: (block_ref) GC: 9 items successfully pushed, will try to delete. 2022-12-27T22:45:01.153560Z INFO garage_table::gc: (block_ref) GC: 9 items successfully pushed, will try to delete. 2022-12-27T22:45:01.203067Z INFO garage_table::gc: (block_ref) GC: 229 items successfully pushed, will try to delete. 2022-12-27T22:45:01.203077Z INFO garage_table::gc: (block_ref) GC: 229 items successfully pushed, will try to delete. ``` Between 2022-12-25T13:39:11 and 2022-12-27T11:53:46 (for about two days), there were no `(block_ref) GC` lines at all. It was just many of those `Resync block`/`Deleting uneeded block` lines. The second node's log looks the same.

garage-1-after-fix.txt.gz

481 KiB

garage-0-after-fix.txt.gz

484 KiB

scvalex commented

2022-12-30 12:14:29 +00:00

Author

Oh, this is the GC delay described here. This was very surprising to me.

I propose the following improvements:

There should be some warning in garage stats that items will stay in the GC queues for at least 24h. All of my experience so far told me that an idling system that is reporting a non-empty work queue is in a broken state.
The 100Mi size for the meta PVC in the helm chart is too small. In my original setup, I used that and sled (because there was no way to configure it to lmbd), and I went over 100Mi with very light use. I had one largeish file and overwritting it a few times was enough.
More generally, there should be some guidance on how large the meta directory needs to be. I'm now using lmdb, and after overwritting my one big file a few times a day, I see I'm using 29.4/200 Mi. Same goes for the data directory; bucket info says the bucket is using 1.9G, but I see I'm using 4.04G in the PVC. This seems like more overhead that I would've expected from a storage system, but I could update my expectations if something was written down.

Oh, this is the GC delay described [here](https://garagehq.deuxfleurs.fr/documentation/design/internals/#garbage-collection). This was very surprising to me. I propose the following improvements: 1) There should be some warning in `garage stats` that items will stay in the GC queues for at least 24h. All of my experience so far told me that an idling system that is reporting a non-empty work queue is in a broken state. 2) The 100Mi size for the meta PVC in the helm chart is too small. In my original setup, I used that and `sled` (because there was no way to configure it to `lmbd`), and I went over 100Mi with very light use. I had one largeish file and overwritting it a few times was enough. 3) More generally, there should be some guidance on how large the meta directory needs to be. I'm now using `lmdb`, and after overwritting my one big file a few times a day, I see I'm using 29.4/200 Mi. Same goes for the data directory; `bucket info` says the bucket is using 1.9G, but I see I'm using 4.04G in the PVC. This seems like more overhead that I would've expected from a storage system, but I could update my expectations if something was written down.

lx commented

2023-01-02 14:21:43 +00:00

Owner

Thanks for the report, I'll look into your suggestions.

For the Helm chart, I'm not qualified but maybe @maximilien wants to take a look?

bucket info says the bucket is using 1.9G, but I see I'm using 4.04G in the PVC

This is a bit surprising, but could be a symptom of incomplete uploads (e.g. PutObject calls that were interrupted in the middle, or multipart uploads that were never finished). Check the unfinished uploads metrics in bucket info, if it is unexpectedly big you can use bucket cleanup-incomplete-uploads

Thanks for the report, I'll look into your suggestions. For the Helm chart, I'm not qualified but maybe @maximilien wants to take a look? > bucket info says the bucket is using 1.9G, but I see I'm using 4.04G in the PVC This is a bit surprising, but could be a symptom of incomplete uploads (e.g. PutObject calls that were interrupted in the middle, or multipart uploads that were never finished). Check the unfinished uploads metrics in `bucket info`, if it is unexpectedly big you can use `bucket cleanup-incomplete-uploads`