Garage gateway-only node with a replication of none fails to acquire quorum. #317

Closed
opened 2022-05-26 15:20:56 +00:00 by bugzbunny · 3 comments

It's likely I'm hitting an edge case here. My suspicion is in a replication factor of none, it's having a hard time trying to get quorum? Sounds like a limitation, if yes, can it be improved?

It's likely I'm hitting an edge case here. My suspicion is in a replication factor of none, it's having a hard time trying to get quorum? Sounds like a limitation, if yes, can it be improved?
Author

As requested by @LX:

gw1

metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
block_size = 10485760

replication_mode = "none"

rpc_bind_addr = "[::]:3901"
rpc_public_addr = "127.0.0.1:3901"
rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975"

bootstrap_peers = []

[s3_api]
s3_region = "us-east-1"
api_bind_addr = "127.0.0.1:3900"
root_domain = ".s3.bugzbunny.net"

[s3_web]
bind_addr = "127.0.0.1:3902"
root_domain = ".s3web.bugzbunny.net"
index = "index.html"

node1

metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
block_size = 10485760

replication_mode = "none"

rpc_bind_addr = "[::]:3901"
rpc_public_addr = "<redacted>:3901"
rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975"

bootstrap_peers = []

[s3_api]
s3_region = "us-east-1"
api_bind_addr = "[::]:3900"
root_domain = ".s3.bugzbunny.net"

[s3_web]
bind_addr = "[::]:3902"
root_domain = ".s3web.bugzbunny.net"
index = "index.html"

node2

metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"
block_size = 10485760

replication_mode = "none"

rpc_bind_addr = "[::]:3901"
rpc_public_addr = "<redacted>:3901"
rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975"

bootstrap_peers = []

[s3_api]
s3_region = "us-east-1"
api_bind_addr = "127.0.0.1:3900"
root_domain = ".s3.bugzbunny.net"

[s3_web]
bind_addr = "127.0.0.1:3902"
root_domain = ".s3web.bugzbunny.net"
index = "index.html"

node3

metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data/zones/s3pa"
block_size = 10485760

replication_mode = "none"

rpc_bind_addr = "[::]:3901"
rpc_public_addr = "127.0.0.1:3901"
rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975"

bootstrap_peers = []

[s3_api]
s3_region = "us-east-1"
api_bind_addr = "127.0.0.1:3900"
root_domain = ".s3.bugzbunny.net"

[s3_web]
bind_addr = "127.0.0.1:3902"
root_domain = ".s3web.bugzbunny.net"
index = "index.html"
garage status                
 INFO  netapp::netapp > Connected to 127.0.0.1:3901, negotiating handshake...
 INFO  netapp::netapp > Connection established to 97a2e149a3bca62e
==== HEALTHY NODES ====
ID                Hostname             Address                      Tags   Zone      Capacity
97a2e149a3bca62e  gw1  <redacted>:3901          [gw1]  <redacted>  gateway
d4c35e793ff3af87  node1   <redacted>:3901           []     <redacted>  9
f6d40586f0bf05a2  node2   <redacted>:3901           []     <redacted>     10
57ec778758d1af57  node3       <redacted>:3901  []     <redacted>      6

NOTE: I had incorrectly assumed 'gw1' wasn't responding when it started failing when I simply had 'rpc_public_addr' set incorrectly.

As requested by @LX: `gw1` ``` metadata_dir = "/var/lib/garage/meta" data_dir = "/var/lib/garage/data" block_size = 10485760 replication_mode = "none" rpc_bind_addr = "[::]:3901" rpc_public_addr = "127.0.0.1:3901" rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975" bootstrap_peers = [] [s3_api] s3_region = "us-east-1" api_bind_addr = "127.0.0.1:3900" root_domain = ".s3.bugzbunny.net" [s3_web] bind_addr = "127.0.0.1:3902" root_domain = ".s3web.bugzbunny.net" index = "index.html" ``` `node1` ``` metadata_dir = "/var/lib/garage/meta" data_dir = "/var/lib/garage/data" block_size = 10485760 replication_mode = "none" rpc_bind_addr = "[::]:3901" rpc_public_addr = "<redacted>:3901" rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975" bootstrap_peers = [] [s3_api] s3_region = "us-east-1" api_bind_addr = "[::]:3900" root_domain = ".s3.bugzbunny.net" [s3_web] bind_addr = "[::]:3902" root_domain = ".s3web.bugzbunny.net" index = "index.html" ``` `node2` ``` metadata_dir = "/var/lib/garage/meta" data_dir = "/var/lib/garage/data" block_size = 10485760 replication_mode = "none" rpc_bind_addr = "[::]:3901" rpc_public_addr = "<redacted>:3901" rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975" bootstrap_peers = [] [s3_api] s3_region = "us-east-1" api_bind_addr = "127.0.0.1:3900" root_domain = ".s3.bugzbunny.net" [s3_web] bind_addr = "127.0.0.1:3902" root_domain = ".s3web.bugzbunny.net" index = "index.html" ``` `node3` ``` metadata_dir = "/var/lib/garage/meta" data_dir = "/var/lib/garage/data/zones/s3pa" block_size = 10485760 replication_mode = "none" rpc_bind_addr = "[::]:3901" rpc_public_addr = "127.0.0.1:3901" rpc_secret = "a4bf83b063b7319687546abf99ac41e5429650a96243483f7e2d503858a89975" bootstrap_peers = [] [s3_api] s3_region = "us-east-1" api_bind_addr = "127.0.0.1:3900" root_domain = ".s3.bugzbunny.net" [s3_web] bind_addr = "127.0.0.1:3902" root_domain = ".s3web.bugzbunny.net" index = "index.html" ``` ``` garage status INFO netapp::netapp > Connected to 127.0.0.1:3901, negotiating handshake... INFO netapp::netapp > Connection established to 97a2e149a3bca62e ==== HEALTHY NODES ==== ID Hostname Address Tags Zone Capacity 97a2e149a3bca62e gw1 <redacted>:3901 [gw1] <redacted> gateway d4c35e793ff3af87 node1 <redacted>:3901 [] <redacted> 9 f6d40586f0bf05a2 node2 <redacted>:3901 [] <redacted> 10 57ec778758d1af57 node3 <redacted>:3901 [] <redacted> 6 ``` **NOTE:** I had incorrectly assumed 'gw1' wasn't responding when it started failing when I simply had 'rpc_public_addr' set incorrectly.
Owner

If you still have this issue, please:

  • paste logs from your nodes when running with RUST_LOG=debug (or even RUST_LOG=trace)
  • tell me exactly what command you are running that provokes the issue
  • tell me exactly what this command returns instead of doing what you want

If you don't have the issue anymore, please inform me so that the issue can be closed.

If you still have this issue, please: - paste logs from your nodes when running with `RUST_LOG=debug` (or even `RUST_LOG=trace`) - tell me exactly what command you are running that provokes the issue - tell me exactly what this command returns instead of doing what you want If you don't have the issue anymore, please inform me so that the issue can be closed.
Owner

Closing for now but feel free to reopen at any time with a more complete bug report.

Closing for now but feel free to reopen at any time with a more complete bug report.
lx closed this issue 2022-06-29 09:45:41 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#317
No description provided.