DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached

kot-o-pes commented

2023-10-22 01:03:27 +00:00

Hello, i am running garage v0.8.4 on two hw arm nodes with 64 cpu and 512 gb ram, disk layout is 6 ssd drives each 8tb mounted as raid6, overall storage size 30tb. Cluster is used only as block storage for https://github.com/thanos-io/thanos to store metrics and retrieve them on demand (prometheus like api), currently size of the bucket is

Size: 5.4 TiB (5.9 TB)
Objects: 24195
Unfinished multipart uploads: 1707

and im running into vast array of issues

Strange behavior of block_ref and sync queue, its constantly grows and never goes down
errors in log

WARN garage_api::generic_server: Response: error 500 Internal Server Error, Internal error (Hyper error): error reading a body from connection: Internal error (Hyper error): error reading a body from connection: end of file before message length reached

ERROR garage_util::background::worker: Error in worker version Merkle (TID 26): DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached

INFO garage_api::generic_server: Response: error 400 Bad Request, Bad request: Part number 2 has already been uploaded

ERROR garage_util::background::worker: Error in worker block_ref Merkle (TID 30): DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached

cat /etc/garage.toml

# Managed by Ansible

metadata_dir = "/mnt/md0/meta"
data_dir = "/mnt/md0/data"
db_engine = "lmdb"

replication_mode = "2"
block_size = 104857600
compression_level = 1
lmdb_map_size = "1T"

rpc_bind_addr = "0.0.0.0:3901"
rpc_public_addr = "0.0.0.0:3901"
rpc_secret = 

bootstrap_peers = []

[s3_api]
s3_region = "garage"
api_bind_addr = "0.0.0.0:3900"
root_domain = ".s3.garage.localhost"

[s3_web]
bind_addr = "0.0.0.0:3902"
root_domain = ".web.garage.localhost"
index = "index.html"

[admin]
api_bind_addr = "0.0.0.0:3903"
admin_token =

ive tried to purge errored blocks (as it seems like actual file that block references is deleted) via garage block purge but no help
also workers seems to be quite odd

TID  State  Name                          Tranq  Done    Queue     Errors  Consec  Last
1    Busy   Block resync worker #1        0      -       13682609  -       -       
20   Busy*  object GC                     -      -       7309      31      31      8 seconds ago
25   Busy   bucket_object_counter queue   -      -       1         8       0       4 hours ago
27   Busy   version sync                  -      -       202       54      54      4 minutes ago
28   Busy*  version GC                    -      -       17174     31      31      8 seconds ago
29   Busy   version queue                 -      -       7         8       0       4 hours ago
31   Busy*  block_ref sync                -      -       201       55      55      57 seconds ago
32   Busy*  block_ref GC                  -      -       1602452   31      31      9 seconds ago
33   Busy   block_ref queue               -      -       14098     8       0       4 hours ago
42   Busy   Block refs repair worker      -      206251  -         -       -       
2    Idle   Block resync worker #2        -      -       -         -       -       
3    Idle   Block resync worker #3        -      -       -         -       -       
4    Idle   Block resync worker #4        -      -       -         -       -       
5    Idle   Block scrub worker            4      -       -         -       -       
6    Idle   bucket_v2 Merkle              -      -       0         57      0       4 minutes ago
7    Idle   bucket_v2 sync                -      -       0         -       -       
8    Idle   bucket_v2 GC                  -      -       0         -       -       
9    Idle   bucket_v2 queue               -      -       0         -       -       
10   Idle   bucket_alias Merkle           -      -       0         48      0       4 minutes ago
11   Idle   bucket_alias sync             -      -       0         -       -       
12   Idle   bucket_alias GC               -      -       0         -       -       
13   Idle   bucket_alias queue            -      -       0         -       -       
14   Idle   key Merkle                    -      -       0         50      0       29 seconds ago
15   Idle   key sync                      -      -       0         -       -       
16   Idle   key GC                        -      -       0         -       -       
17   Idle   key queue                     -      -       0         -       -       
18   Idle   object Merkle                 -      -       0         3029    0       21 seconds ago
19   Idle   object sync                   -      -       0         29      0       2 minutes ago
21   Idle   object queue                  -      -       0         -       -       
22   Idle   bucket_object_counter Merkle  -      -       0         1419    0       13 seconds ago
23   Idle   bucket_object_counter sync    -      -       0         29      0       2 minutes ago
24   Idle   bucket_object_counter GC      -      -       0         -       -       
26   Idle   version Merkle                -      -       0         4554    0       4 seconds ago
30   Idle   block_ref Merkle              -      -       0         4372    0       10 seconds ago
34   Idle   k2v_item Merkle               -      -       0         60      0       17 minutes ago
35   Idle   k2v_item sync                 -      -       0         -       -       
36   Idle   k2v_item GC                   -      -       0         -       -       
37   Idle   k2v_item queue                -      -       0         -       -       
38   Idle   k2v_index_counter_v2 Merkle   -      -       0         50      0       11 minutes ago
39   Idle   k2v_index_counter_v2 sync     -      -       0         -       -       
40   Idle   k2v_index_counter_v2 GC       -      -       0         -       -       
41   Idle   k2v_index_counter_v2 queue    -      -       0         -       -       
43   Done   Version repair worker         -      38331   -         -       -

would appreciate any help regarding those issues

Hello, i am running garage v0.8.4 on two hw arm nodes with 64 cpu and 512 gb ram, disk layout is 6 ssd drives each 8tb mounted as raid6, overall storage size 30tb. Cluster is used only as block storage for https://github.com/thanos-io/thanos to store metrics and retrieve them on demand (prometheus like api), currently size of the bucket is ``` Size: 5.4 TiB (5.9 TB) Objects: 24195 Unfinished multipart uploads: 1707 ``` and im running into vast array of issues 1) Strange behavior of block_ref and sync queue, its constantly grows and never goes down ![image](/attachments/cb59fe6f-f99e-45e0-80d7-a90c8c64f795) ![image](/attachments/01c3843c-89b4-4343-9339-5fc715e91f5c) ![image](/attachments/88a3f017-5c27-4f60-b79c-406cc0625ec4) 2) errors in log ``` WARN garage_api::generic_server: Response: error 500 Internal Server Error, Internal error (Hyper error): error reading a body from connection: Internal error (Hyper error): error reading a body from connection: end of file before message length reached ERROR garage_util::background::worker: Error in worker version Merkle (TID 26): DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached INFO garage_api::generic_server: Response: error 400 Bad Request, Bad request: Part number 2 has already been uploaded ERROR garage_util::background::worker: Error in worker block_ref Merkle (TID 30): DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached ``` cat /etc/garage.toml ``` # Managed by Ansible metadata_dir = "/mnt/md0/meta" data_dir = "/mnt/md0/data" db_engine = "lmdb" replication_mode = "2" block_size = 104857600 compression_level = 1 lmdb_map_size = "1T" rpc_bind_addr = "0.0.0.0:3901" rpc_public_addr = "0.0.0.0:3901" rpc_secret = bootstrap_peers = [] [s3_api] s3_region = "garage" api_bind_addr = "0.0.0.0:3900" root_domain = ".s3.garage.localhost" [s3_web] bind_addr = "0.0.0.0:3902" root_domain = ".web.garage.localhost" index = "index.html" [admin] api_bind_addr = "0.0.0.0:3903" admin_token = ``` ive tried to purge errored blocks (as it seems like actual file that block references is deleted) via garage block purge but no help also workers seems to be quite odd ``` TID State Name Tranq Done Queue Errors Consec Last 1 Busy Block resync worker #1 0 - 13682609 - - 20 Busy* object GC - - 7309 31 31 8 seconds ago 25 Busy bucket_object_counter queue - - 1 8 0 4 hours ago 27 Busy version sync - - 202 54 54 4 minutes ago 28 Busy* version GC - - 17174 31 31 8 seconds ago 29 Busy version queue - - 7 8 0 4 hours ago 31 Busy* block_ref sync - - 201 55 55 57 seconds ago 32 Busy* block_ref GC - - 1602452 31 31 9 seconds ago 33 Busy block_ref queue - - 14098 8 0 4 hours ago 42 Busy Block refs repair worker - 206251 - - - 2 Idle Block resync worker #2 - - - - - 3 Idle Block resync worker #3 - - - - - 4 Idle Block resync worker #4 - - - - - 5 Idle Block scrub worker 4 - - - - 6 Idle bucket_v2 Merkle - - 0 57 0 4 minutes ago 7 Idle bucket_v2 sync - - 0 - - 8 Idle bucket_v2 GC - - 0 - - 9 Idle bucket_v2 queue - - 0 - - 10 Idle bucket_alias Merkle - - 0 48 0 4 minutes ago 11 Idle bucket_alias sync - - 0 - - 12 Idle bucket_alias GC - - 0 - - 13 Idle bucket_alias queue - - 0 - - 14 Idle key Merkle - - 0 50 0 29 seconds ago 15 Idle key sync - - 0 - - 16 Idle key GC - - 0 - - 17 Idle key queue - - 0 - - 18 Idle object Merkle - - 0 3029 0 21 seconds ago 19 Idle object sync - - 0 29 0 2 minutes ago 21 Idle object queue - - 0 - - 22 Idle bucket_object_counter Merkle - - 0 1419 0 13 seconds ago 23 Idle bucket_object_counter sync - - 0 29 0 2 minutes ago 24 Idle bucket_object_counter GC - - 0 - - 26 Idle version Merkle - - 0 4554 0 4 seconds ago 30 Idle block_ref Merkle - - 0 4372 0 10 seconds ago 34 Idle k2v_item Merkle - - 0 60 0 17 minutes ago 35 Idle k2v_item sync - - 0 - - 36 Idle k2v_item GC - - 0 - - 37 Idle k2v_item queue - - 0 - - 38 Idle k2v_index_counter_v2 Merkle - - 0 50 0 11 minutes ago 39 Idle k2v_index_counter_v2 sync - - 0 - - 40 Idle k2v_index_counter_v2 GC - - 0 - - 41 Idle k2v_index_counter_v2 queue - - 0 - - 43 Done Version repair worker - 38331 - - - ``` would appreciate any help regarding those issues

image.png

39 KiB

image.png

53 KiB

image.png

37 KiB

kot-o-pes commented

2023-10-23 07:43:15 +00:00

Author

Updated to v0.9 but errors still occurring
also garage unit on one node failed with error

 garage.service: Consumed 4d 7h 19min 45.820s CPU time.
 garage.service: Failed with result 'signal'.
 garage.service: Main process exited, code=killed, status=6/ABRT
 28: std::sys::unix::thread::Thread::new::thread_start
 27: core::ops::function::FnOnce::call_once{{vtable.shim}}
 26: std::sys_common::backtrace::__rust_begin_short_backtrace
 25: tokio::runtime::blocking::pool::Inner::run
 24: tokio::runtime::task::harness::Harness<T,S>::poll
 23: tokio::runtime::task::core::Core<T,S>::poll
 22: tokio::runtime::scheduler::multi_thread::worker::run
 21: tokio::runtime::context::runtime::enter_runtime
 20: tokio::runtime::context::scoped::Scoped<T>::set
 19: tokio::runtime::scheduler::multi_thread::worker::Context::run
 18: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
 17: tokio::runtime::task::harness::Harness<T,S>::poll
 16: tokio::runtime::task::core::Core<T,S>::poll
 15: <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll
  14: <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll
  13: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
  12: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner
  11: garage_api::generic_server::ApiServer<A>::handler::{{closure}}
  10: <opentelemetry::trace::context::WithContext<T> as core::future::future::Future>::poll
  9: <T as garage_util::metrics::RecordDuration>::record_duration::{{closure}}
  8: <garage_api::s3::api_server::S3ApiServer as garage_api::generic_server::ApiHandler>::handle::{{closure}}
   7: garage_api::s3::get::body_from_blocks_range
   6: core::panicking::panic_bounds_check
   5: core::panicking::panic_fmt
   4: rust_begin_unwind
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   2: std::panicking::begin_panic_handler::{{closure}}
   1: std::panicking::rust_panic_with_hook
   0: garage::main::{{closure}}::{{closure}}

Updated to v0.9 but errors still occurring also garage unit on one node failed with error ``` garage.service: Consumed 4d 7h 19min 45.820s CPU time. garage.service: Failed with result 'signal'. garage.service: Main process exited, code=killed, status=6/ABRT 28: std::sys::unix::thread::Thread::new::thread_start 27: core::ops::function::FnOnce::call_once{{vtable.shim}} 26: std::sys_common::backtrace::__rust_begin_short_backtrace 25: tokio::runtime::blocking::pool::Inner::run 24: tokio::runtime::task::harness::Harness<T,S>::poll 23: tokio::runtime::task::core::Core<T,S>::poll 22: tokio::runtime::scheduler::multi_thread::worker::run 21: tokio::runtime::context::runtime::enter_runtime 20: tokio::runtime::context::scoped::Scoped<T>::set 19: tokio::runtime::scheduler::multi_thread::worker::Context::run 18: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 17: tokio::runtime::task::harness::Harness<T,S>::poll 16: tokio::runtime::task::core::Core<T,S>::poll 15: <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll 14: <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll 13: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch 12: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_inner 11: garage_api::generic_server::ApiServer<A>::handler::{{closure}} 10: <opentelemetry::trace::context::WithContext<T> as core::future::future::Future>::poll 9: <T as garage_util::metrics::RecordDuration>::record_duration::{{closure}} 8: <garage_api::s3::api_server::S3ApiServer as garage_api::generic_server::ApiHandler>::handle::{{closure}} 7: garage_api::s3::get::body_from_blocks_range 6: core::panicking::panic_bounds_check 5: core::panicking::panic_fmt 4: rust_begin_unwind 3: std::sys_common::backtrace::__rust_end_short_backtrace 2: std::panicking::begin_panic_handler::{{closure}} 1: std::panicking::rust_panic_with_hook 0: garage::main::{{closure}}::{{closure}} ```

lx commented

2023-10-23 10:12:12 +00:00

Owner

Congrats, I think you've probably put the most load on a Garage cluster than I have ever seen. You are definitely in uncharted territory here, and it is to be expected that things could break.

You seem to be having at least the following issues:

DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached: this means you are doing too many concurrent requests compared to the current configuration of your Garage node. Each requests may need to aquire one (or several) "reader slot" in the LMDB DB, among a fixed pool of available slots. The size of the pool is defined here. Your options are: send fewer requests; change the value to a bigger value and recompile Garage; make a PR that allows this value to be changed in the garage.toml config file (please do it if you have the time!). This is the very first thing you need to fix, Garage works under the expectation that the DB engine can be relied upon at all times, which is not the case here.
Your resync queue contains way too many items, did you call garage repair blocks several times without waiting for the queue to be drained? I see you are using a block size of 100MB and you have about 5TB of data, so you should have of the order of 50-100k blocks, not 10M.
INFO garage_api::generic_server: Response: error 400 Bad Request, Bad request: Part number 2 has already been uploaded : I think this one will stop appearing with Garage v0.9

For further investigation, please show also the output of:

garage block list-errors
garage stats -a

The following is not an issue:

The GC queue will always be > 0 as items in it are only processed 24h after they have been added.

Congrats, I think you've probably put the most load on a Garage cluster than I have ever seen. You are definitely in uncharted territory here, and it is to be expected that things could break. You seem to be having at least the following issues: - `DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached`: this means you are doing too many concurrent requests compared to the current configuration of your Garage node. Each requests may need to aquire one (or several) "reader slot" in the LMDB DB, among a fixed pool of available slots. The size of the pool is defined [here](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/model/garage.rs#L174). Your options are: send fewer requests; change the value to a bigger value and recompile Garage; make a PR that allows this value to be changed in the garage.toml config file (please do it if you have the time!). **This is the very first thing you need to fix**, Garage works under the expectation that the DB engine can be relied upon at all times, which is not the case here. - Your resync queue contains way too many items, did you call `garage repair blocks` several times without waiting for the queue to be drained? I see you are using a block size of 100MB and you have about 5TB of data, so you should have of the order of 50-100k blocks, not 10M. - `INFO garage_api::generic_server: Response: error 400 Bad Request, Bad request: Part number 2 has already been uploaded` : I think this one will stop appearing with Garage v0.9 For further investigation, please show also the output of: - `garage block list-errors` - `garage stats -a` The following is not an issue: - The GC queue will always be > 0 as items in it are only processed 24h after they have been added.

lx commented

2023-10-23 10:15:59 +00:00

Owner

Concerning your second message with a stack trace, this looks like it could be a logic bug in Garage code. Could you open a separate issue to discuss it?

kot-o-pes commented

2023-10-23 12:31:12 +00:00

Author

Hello, thank you for a swift response, after update to v0.9 i couldnt get rid of 500x errors and partial uploads, also i had no incoming traffic for some reason, ive deleted meta and data folders (basically purged whole cluster and reenabled it from the start), so far i have no errors for api, and no lmdb errors (ive decreased concurrency in thanos also) the only thing im concerned about is resync queue, also ive set block size back to 10mb

Hello, thank you for a swift response, after update to v0.9 i couldnt get rid of 500x errors and partial uploads, also i had no incoming traffic for some reason, ive deleted meta and data folders (basically purged whole cluster and reenabled it from the start), so far i have no errors for api, and no lmdb errors (ive decreased concurrency in thanos also) the only thing im concerned about is resync queue, also ive set block size back to 10mb ![image](/attachments/407f5e17-f2a5-4847-a33c-be4cb9980c49)

image.png

32 KiB

kot-o-pes commented

2023-10-23 13:02:12 +00:00

Author

output of garage stats -a, errors are empty due to reinit

======================
Stats for node 584285c9b248915a:

Garage version: v0.9.0 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs]
Rust compiler version: 1.68.0

Database engine: LMDB (using Heed crate)

Table stats:
  Table      Items    MklItems  MklTodo  GcTodo
  bucket_v2  1        1         0        0
  key        2        3         0        0
  object     9356     10256     0        0
  version    37434    45139     44       20571
  block_ref  3069878  3385676   103      1018466

Block manager stats:
  number of RC entries (~= number of blocks): 1544335
  resync queue length: 881111
  blocks with resync errors: 0

If values are missing above (marked as NC), consider adding the --detailed flag (this will be slow).


======================
Stats for node e1ba89677bf195a6:

Garage version: v0.9.0 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs]
Rust compiler version: 1.68.0

Database engine: LMDB (using Heed crate)

Table stats:
  Table      Items    MklItems  MklTodo  GcTodo
  bucket_v2  1        1         0        0
  key        2        3         0        0
  object     9356     10256     0        0
  version    37434    45139     4        9997
  block_ref  3069944  3384385   1706     506931

Block manager stats:
  number of RC entries (~= number of blocks): 1544400
  resync queue length: 85817
  blocks with resync errors: 0

If values are missing above (marked as NC), consider adding the --detailed flag (this will be slow).


======================
Cluster statistics:

Storage nodes:
  ID                Hostname                          Zone  Capacity  Part.  DataAvail                MetaAvail
  e1ba89677bf195a6  node-1 EL    28.0 TB   256    30.0 TB/30.7 TB (97.5%)  30.0 TB/30.7 TB (97.5%)
  584285c9b248915a  node-2  EL    28.0 TB   256    30.0 TB/30.7 TB (97.5%)  30.0 TB/30.7 TB (97.5%)

Estimated available storage space cluster-wide (might be lower in practice):
  data: 30.0 TB
  metadata: 30.0 TB

Generally speaking is there any guidelines on how to scale garage for storing large quantities of files optimized for read/write performance?

output of garage stats -a, errors are empty due to reinit ``` ====================== Stats for node 584285c9b248915a: Garage version: v0.9.0 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs] Rust compiler version: 1.68.0 Database engine: LMDB (using Heed crate) Table stats: Table Items MklItems MklTodo GcTodo bucket_v2 1 1 0 0 key 2 3 0 0 object 9356 10256 0 0 version 37434 45139 44 20571 block_ref 3069878 3385676 103 1018466 Block manager stats: number of RC entries (~= number of blocks): 1544335 resync queue length: 881111 blocks with resync errors: 0 If values are missing above (marked as NC), consider adding the --detailed flag (this will be slow). ====================== Stats for node e1ba89677bf195a6: Garage version: v0.9.0 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs] Rust compiler version: 1.68.0 Database engine: LMDB (using Heed crate) Table stats: Table Items MklItems MklTodo GcTodo bucket_v2 1 1 0 0 key 2 3 0 0 object 9356 10256 0 0 version 37434 45139 4 9997 block_ref 3069944 3384385 1706 506931 Block manager stats: number of RC entries (~= number of blocks): 1544400 resync queue length: 85817 blocks with resync errors: 0 If values are missing above (marked as NC), consider adding the --detailed flag (this will be slow). ====================== Cluster statistics: Storage nodes: ID Hostname Zone Capacity Part. DataAvail MetaAvail e1ba89677bf195a6 node-1 EL 28.0 TB 256 30.0 TB/30.7 TB (97.5%) 30.0 TB/30.7 TB (97.5%) 584285c9b248915a node-2 EL 28.0 TB 256 30.0 TB/30.7 TB (97.5%) 30.0 TB/30.7 TB (97.5%) Estimated available storage space cluster-wide (might be lower in practice): data: 30.0 TB metadata: 30.0 TB ``` Generally speaking is there any guidelines on how to scale garage for storing large quantities of files optimized for read/write performance?

lx commented

2023-10-23 16:04:57 +00:00

Owner

For the resync queue, definitely do the following:

garage worker set -a resync-tranquility 0
garage worker set -a resync-worker-count 8

For the resync queue, definitely do the following: ``` garage worker set -a resync-tranquility 0 garage worker set -a resync-worker-count 8 ```

kot-o-pes commented

2023-10-23 19:29:44 +00:00

Author

For the resync queue, definitely do the following:
garage worker set -a resync-tranquility 0
garage worker set -a resync-worker-count 8

Thank you queue did start to decrease rapidly

Concerning your second message with a stack trace, this looks like it could be a logic bug in Garage code. Could you open a separate issue to discuss it?

i cannot reproduce this error now, i think it was related to setting block size too high, it was to 512mb at that point as i was trying to figure out what could be the problem

> For the resync queue, definitely do the following: > > ``` > garage worker set -a resync-tranquility 0 > garage worker set -a resync-worker-count 8 > ``` Thank you queue did start to decrease rapidly ![image](/attachments/e4f2d529-dcec-40aa-869b-e7f59e697486) > Concerning your second message with a stack trace, this looks like it could be a logic bug in Garage code. Could you open a separate issue to discuss it? i cannot reproduce this error now, i think it was related to setting block size too high, it was to 512mb at that point as i was trying to figure out what could be the problem

image.png

38 KiB

kot-o-pes commented

2023-10-23 19:33:54 +00:00

Author

ive also noticed some strange (as it seems at this point) behavior regarding memory consumption

root@monitoring-s3-garage-02:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           376Gi       6.9Gi       2.2Gi       1.0Mi       367Gi       366Gi
Swap:          975Mi        14Mi       961Mi

looks like garage tries to cache all memory available on the node itself, but maybe its expected

ive also noticed some strange (as it seems at this point) behavior regarding memory consumption ``` root@monitoring-s3-garage-02:~# free -h total used free shared buff/cache available Mem: 376Gi 6.9Gi 2.2Gi 1.0Mi 367Gi 366Gi Swap: 975Mi 14Mi 961Mi ``` looks like garage tries to cache *all* memory available on the node itself, but maybe its expected

lx commented

2023-10-24 11:00:03 +00:00

Owner

The caching behavior is normal, it is not Garage but your kernel that is keeping everything in cache. Things that are in cache can be evicted when the memory is necessary for actually running programs, but as long as it is not required the cache just stays there because it has a probability of helping accelerate future disk access. This is 100% normal for Linux.

kot-o-pes commented

2023-10-25 10:26:10 +00:00

Author

after few days of being stable, one of the thanos components decided to restart and resync blocks stored on s3 (10.1 tb) which increased read operations and basically made cluster read only as all the write operations failed with 503 error, unfortunately i couldnt find anything suspicious in logs, it looks like operations are working as expected but theres no write requests

also looks like cluster is out of sync again

found an error looks like one node is down but it availiable via icmp and garage process is up

garage[212671]: 2023-10-25T10:23:49.202238Z  WARN garage_api::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2. 1 of 2 request succeeded, others returned errors: ["Timeout"]

after few days of being stable, one of the thanos components decided to restart and resync blocks stored on s3 (10.1 tb) which increased read operations and basically made cluster read only as all the write operations failed with 503 error, unfortunately i couldnt find anything suspicious in logs, it looks like operations are working as expected but theres no write requests ![image](/attachments/8f78d1c8-da8a-4710-ab8f-e88f8a281eac) ![image](/attachments/72f136c7-5510-4cad-890f-c9ecc05c6851) ![image](/attachments/88871f5a-8f79-4942-bbb9-0299cf62cd3e) also looks like cluster is out of sync again ![image](/attachments/37219d0d-d0f1-4a0d-be8d-18b2bd4fec04) found an error looks like one node is down but it availiable via icmp and garage process is up ``` garage[212671]: 2023-10-25T10:23:49.202238Z WARN garage_api::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2. 1 of 2 request succeeded, others returned errors: ["Timeout"] ```

image.png

110 KiB

image.png

64 KiB

image.png

67 KiB

image.png

34 KiB

kot-o-pes commented

2023-10-25 13:46:04 +00:00

Author

After few attempts to restore it i decided to drop data again and reinit with multiple disks

metadata_dir = "/meta1"

data_dir = [
    { path = "/data1", capacity = "7T" },
    { path = "/data2", capacity = "7T" },
    { path = "/data3", capacity = "7T" },
    { path = "/data4", capacity = "7T" },
    { path = "/data5", capacity = "7T" }            
]

after reinit seems like cluster is not working at all, 503 errors on upload and

    2023-10-25T13:39:03.691466Z ERROR garage_block::resync: Error when resyncing 576d8fa2349dd1b4: NeedBlockQuery RPC

errors in log

After few attempts to restore it i decided to drop data again and reinit with multiple disks ``` metadata_dir = "/meta1" data_dir = [ { path = "/data1", capacity = "7T" }, { path = "/data2", capacity = "7T" }, { path = "/data3", capacity = "7T" }, { path = "/data4", capacity = "7T" }, { path = "/data5", capacity = "7T" } ] ``` after reinit seems like cluster is not working at all, 503 errors on upload and ``` 2023-10-25T13:39:03.691466Z ERROR garage_block::resync: Error when resyncing 576d8fa2349dd1b4: NeedBlockQuery RPC ``` errors in log ![image](/attachments/0922afba-7fc2-4a94-80c6-e4406303477f)

image.png

193 KiB

kot-o-pes commented

2023-10-25 15:50:13 +00:00

Author

So ive reinited previous configuration, with raid6 but tuned parameters a bit
for raid ive set

      chunk_size: '128K'

and also i had some errors in dmesg

[Wed Oct 25 18:24:55 2023] TCP: request_sock_TCP: Possible SYN flooding on port 3900. Sending cookies.  Check SNMP counters.

ive tried to tacke it by setting

sysctl net.ipv4.tcp_max_syn_backlog=16384
sysctl net.core.somaxconn=16384

but no luck

garage[230415]: 2023-10-25T15:40:27.815357Z  WARN garage_api::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2. 1 of 2 request succeeded, others returned errors: ["Timeout"]

probably its not possible to make it work for such high load( but anyway looks quite good and easy to operate if you dont have large amount of data and you dont query it constantly

So ive reinited previous configuration, with raid6 but tuned parameters a bit for raid ive set ``` chunk_size: '128K' ``` and also i had some errors in dmesg ``` [Wed Oct 25 18:24:55 2023] TCP: request_sock_TCP: Possible SYN flooding on port 3900. Sending cookies. Check SNMP counters. ``` ive tried to tacke it by setting ``` sysctl net.ipv4.tcp_max_syn_backlog=16384 sysctl net.core.somaxconn=16384 ``` but no luck ``` garage[230415]: 2023-10-25T15:40:27.815357Z WARN garage_api::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2. 1 of 2 request succeeded, others returned errors: ["Timeout"] ``` probably its not possible to make it work for such high load( but anyway looks quite good and easy to operate if you dont have large amount of data and you dont query it constantly

tamo commented

2023-10-26 22:50:11 +00:00

DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached: this means you are doing too many concurrent requests compared to the current configuration of your Garage node. Each requests may need to aquire one (or several) "reader slot" in the LMDB DB, among a fixed pool of available slots. The size of the pool is defined here. Your options are: send fewer requests; change the value to a bigger value and recompile Garage; make a PR that allows this value to be changed in the garage.toml config file (please do it if you have the time!). This is the very first thing you need to fix, Garage works under the expectation that the DB engine can be relied upon at all times, which is not the case here.

Hey, as a big user of LMDB/Heed myself, I can say that we already encountered the issue at meilisearch and ended up increasing the value to 1024 on our side.
It doesn't seem to have a big impact on performances, so I guess the default value could be way higher than 500. See: https://web.archive.org/web/20200112155415/https://twitter.com/armon/status/534867803426533376

> - `DB error: LMDB: MDB_READERS_FULL: Environment maxreaders limit reached`: this means you are doing too many concurrent requests compared to the current configuration of your Garage node. Each requests may need to aquire one (or several) "reader slot" in the LMDB DB, among a fixed pool of available slots. The size of the pool is defined [here](https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/main/src/model/garage.rs#L174). Your options are: send fewer requests; change the value to a bigger value and recompile Garage; make a PR that allows this value to be changed in the garage.toml config file (please do it if you have the time!). **This is the very first thing you need to fix**, Garage works under the expectation that the DB engine can be relied upon at all times, which is not the case here. Hey, as a big user of LMDB/Heed myself, I can say that we already [encountered the issue at meilisearch](https://github.com/meilisearch/meilisearch/issues/2648) and ended up increasing the value to 1024 on our side. It doesn't seem to have a big impact on performances, so I guess the default value could be way higher than 500. See: https://web.archive.org/web/20200112155415/https://twitter.com/armon/status/534867803426533376

lx commented

2023-11-06 09:46:35 +00:00

Owner

@tamo thanks for the feedback, we'll consider increasing the reader count to at least 1024