New nodes not used as storage nodes #698

Closed
opened 2024-01-28 13:25:03 +00:00 by en0x · 4 comments

Hi, I'm using garage as nextcloud storage and I start receiving 500 errors from the S3 API. I suspect it is due to insufficient free memory, as two nodes are full. Why are the other two free nodes not used as storage nodes? I have noticed before that only two nodes are used as storage nodes and I had the assumption that further nodes are only listed when data is actually stored on them. Unfortunately, I couldn't find anything about this in the documentation.

All nodes were assigned with the following command: garage layout assign -c 150 -z <ZONE> <ID>

> garage stats --detailed

Garage version: v0.8.2 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs]
Rust compiler version: 1.63.0

Database engine: LMDB (using Heed crate)

Table stats:
  Table      Items  MklItems  MklTodo  GcTodo
  bucket_v2  1      1         0        0
  key        1      1         0        0
  object     0      0         0        0
  version    0      0         0        0
  block_ref  0      0         0        0

Block manager stats:
  number of RC entries (~= number of blocks): 0
  resync queue length: 0
  blocks with resync errors: 0

Storage nodes:
  ID                  Zone  Capacity  Part.  DataAvail                 MetaAvail
  b2493052b3061fdd    fra1  150       128    151.6 KB/161.0 GB (0.0%)  9.2 GB/48.3 GB (19.0%)
  d70bff9518aa1107    fra1  150       128    41.0 KB/161.0 GB (0.0%)   11.8 GB/48.3 GB (24.4%)

Estimated available storage space cluster-wide (might be lower in practice):
  data: 81.9 KB
  metadata: 18.3 GB

 > garage status
==== HEALTHY NODES ====
ID                  Address                     Tags            Zone  Capacity  DataAvail
d70bff9518aa1107    10.1.0.100:3901             []              fra1  150       41.0 KB (0.0%)
b2493052b3061fdd    [::ffff:10.100.0.100]:3901  []              fra1  150       151.6 KB (0.0%)
a13dc7533ad626bb    10.0.0.100:3901             []              fra2  150       159.8 GB (99.3%)
07531a39050119af    [::ffff:10.2.0.100]:3901    []              fra1  150       159.8 GB (99.3%)

Kind regards
Tim

Hi, I'm using garage as nextcloud storage and I start receiving 500 errors from the S3 API. I suspect it is due to insufficient free memory, as two nodes are full. Why are the other two free nodes not used as storage nodes? I have noticed before that only two nodes are used as storage nodes and I had the assumption that further nodes are only listed when data is actually stored on them. Unfortunately, I couldn't find anything about this in the documentation. All nodes were assigned with the following command: `garage layout assign -c 150 -z <ZONE> <ID>` ``` > garage stats --detailed Garage version: v0.8.2 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs] Rust compiler version: 1.63.0 Database engine: LMDB (using Heed crate) Table stats: Table Items MklItems MklTodo GcTodo bucket_v2 1 1 0 0 key 1 1 0 0 object 0 0 0 0 version 0 0 0 0 block_ref 0 0 0 0 Block manager stats: number of RC entries (~= number of blocks): 0 resync queue length: 0 blocks with resync errors: 0 Storage nodes: ID Zone Capacity Part. DataAvail MetaAvail b2493052b3061fdd fra1 150 128 151.6 KB/161.0 GB (0.0%) 9.2 GB/48.3 GB (19.0%) d70bff9518aa1107 fra1 150 128 41.0 KB/161.0 GB (0.0%) 11.8 GB/48.3 GB (24.4%) Estimated available storage space cluster-wide (might be lower in practice): data: 81.9 KB metadata: 18.3 GB > garage status ==== HEALTHY NODES ==== ID Address Tags Zone Capacity DataAvail d70bff9518aa1107 10.1.0.100:3901 [] fra1 150 41.0 KB (0.0%) b2493052b3061fdd [::ffff:10.100.0.100]:3901 [] fra1 150 151.6 KB (0.0%) a13dc7533ad626bb 10.0.0.100:3901 [] fra2 150 159.8 GB (99.3%) 07531a39050119af [::ffff:10.2.0.100]:3901 [] fra1 150 159.8 GB (99.3%) ``` Kind regards Tim
Owner

Hello @en0x , a first thing that stands out is that you shouldn't be using "150" as a capacity value, you should be using "150G" for "150 gigabytes", otherwise Garage will interpret it as "150 bytes".

Once you fix all your capacities to "150G", can you post the output of garage layout show before applying, and then after?

Hello @en0x , a first thing that stands out is that you shouldn't be using "150" as a capacity value, you should be using "150G" for "150 gigabytes", otherwise Garage will interpret it as "150 bytes". Once you fix all your capacities to "150G", can you post the output of `garage layout show` **before applying**, and then after?
Author

Hello @lx , I tried

garage layout assign -c 150G -z fra1 d70bff9518aa1107 b2493052b3061fdd a13dc7533ad626bb 07531a39050119af

and

garage layout assign -c "150G" -z fra1 d70bff9518aa1107 b2493052b3061fdd a13dc7533ad626bb 07531a39050119af

but both return

error: Invalid value for '--capacity <capacity>': invalid digit found in string

Was this changed in a version newer than 0.8.2? Because the garage status in my initial post shows the 150 correct as GB.

Hello @lx , I tried `garage layout assign -c 150G -z fra1 d70bff9518aa1107 b2493052b3061fdd a13dc7533ad626bb 07531a39050119af` and `garage layout assign -c "150G" -z fra1 d70bff9518aa1107 b2493052b3061fdd a13dc7533ad626bb 07531a39050119af` but both return `error: Invalid value for '--capacity <capacity>': invalid digit found in string` Was this changed in a version newer than 0.8.2? Because the `garage status` in my initial post shows the 150 correct as GB.
Owner

Sorry, I had not seen that you were using a v0.8. The layout algorithm in that version is broken, which probably explains the issues you are having. You should upgrade to v0.9 if possible.

Sorry, I had not seen that you were using a v0.8. The layout algorithm in that version is broken, which probably explains the issues you are having. You should upgrade to v0.9 if possible.
en0x closed this issue 2024-01-30 12:34:44 +00:00
Author

Yes, the update to v0.9.1 fixed the issue. After applying the new layout the data was rebalanced. Thanks for your help.

Yes, the update to v0.9.1 fixed the issue. After applying the new layout the data was rebalanced. Thanks for your help.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#698
No description provided.