Garage connects to Consul node address instead of service address when using agent API #675
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#675
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Using Garage version 0.8.4. This used to work in 0.8.2.
Discovered this in a NixOS test, with the following nodes:
All Garage nodes correctly (meaning with the right addresses) register themselves as services in Consul. However, when trying to connect to each other, they instead try to connect to the Consul node.
It seems like 0.8.4 has reworked how registration works with the agent API. In 0.8.2, Garage used to create its own "virtual"
garage:deadbeef
nodes in Consul and register a service on them. In 0.8.4, this is still how it works with the catalog API, but with the agent API, there are no virtual nodes being created and Garage just registers a service on the agent node itself. Presumably, then, Garage has always used the node IP address instead of the service IP address, but this only became visible now as Garage is now in a situation where it doesn't control what the node address is.Hi Max, using the
agent
API does not create "virtual" nodes, as you point out, by design, based on my assumption (informed by consul's Architecture reference), that In a typical deployment, you must run client agents on every compute node in your datacenter. That doesn't seem to be the case for your setup, so the provided strace screenshot showing the process connecting to192.168.1.1:3901
makes sense, considering how I built the agent api integration.To accommodate for your architecture a change could be made here to read from the service's TaggedAddresses field instead of the
Address
of the node (and the corresponding query declarations). The change must consider making sure to keep reading from theAddress
field as to not break the workflow for users of thecatalog
API.As additional context,
Address
was used since the docs for that field specifyaddress
corresponds to the service IP, ad fall backs to the node's:I'm not sure how it would not be specified, though, perhaps a bug here?
edit
Took another look at the image where you query consul's API and
Address
does seem to map to what's expected, so wondering ifgarage1
host gets a different response thanconsul
shows in the screenshot?It's not nice that I'm using a single Consul agent for 3 nodes, that's correct. However, that's not the only scenario where service addresses might differ from the Consul node addresses. For example, in my actual deployment, I have a Consul agent on every node, and the Consul agents talk to each other through a WireGuard mesh, so the node address listed for each agent is that node's IP address for that WireGuard mesh interface. Depending on the individual service's exposure, I can then configure its service address to be that same mesh IP, or the node's public IP, or the IP of a different tunnel interface. For example, the service representing Garage's S3 API is configured to use public IP addresses.
The problem seems to occur while querying, not while registering.
I'm assuming that
ServiceAddress
should be used in this line instead ofAddress
, as per https://developer.hashicorp.com/consul/api-docs/catalog#serviceaddress