Question - impact of higher-than-supported latency #589

New Issue

Closed

opened 2023-06-24 05:03:14 +00:00 by wings · 2 comments

wings commented

2023-06-24 05:03:14 +00:00

Hi there!

We want (need, basically) to run a Garage cluster between three points of presence, one in Texas, USA and two in Western Australia. Your front page says the maximum supported latency is 200ms, between those two points the latency is measured at 233ms... so close, but yet so far :)

How bad is it when Garage has a latency higher than what is supported? We don't plan to do any ingest at "Site B" and "Site C" in WA, they will be purely mirroring data one-way from Texas.

Is it that it degrades performance, or do things actually break in half?

Thanks!

Hi there! We want (need, basically) to run a Garage cluster between three points of presence, one in Texas, USA and two in Western Australia. Your front page says the maximum supported latency is 200ms, between those two points the latency is measured at 233ms... so close, but yet so far :) How bad is it when Garage has a latency higher than what is supported? We don't plan to do *any* ingest at "Site B" and "Site C" in WA, they will be purely mirroring data one-way from Texas. Is it that it degrades performance, or do things actually break in half? Thanks!

lx commented

2023-06-24 06:31:34 +00:00

Owner

Hello :)

In theory, it should only degrade performance, and not break anything, as long as you are doing all your requests to the Texas server and not using the WA servers as an API server for your workload.

Performance-wise, what will actually happen will depend a lot from your configuration :

If you are using the normal replication_mode="3", then most operations initiated in Texas will take around one second to complete. This is obviously not acceptable for interactive workloads, but it might be fine if you are just doing back-ups, just increase the number of concurrent requests.
If you'd like data to be read and written primarily to/from the local servers in Texas, you can use replication_mode="3-dangerous". This will reduce data durability slightly as a crash of your Texas server at the wrong time might cause some data to be lost. However most things will be extremely fast because all requests can be completed without contacting a server in Australia
If instead of having two sites in WA and one in Texas, you have two in Texas and one in WA, you can use replication_mode="3" and have acceptable latency as Garage only waits for one remote copy to be done (total quorum = 2) before returning, and in this case the copy to the other Texas DC is sufficient.

Please try it out and tell us how things work out!

Hope this helps :)

Hello :) In theory, it should only degrade performance, and not break anything, as long as you are doing all your requests to the Texas server and not using the WA servers as an API server for your workload. Performance-wise, what will actually happen will depend a lot from your configuration : - If you are using the normal `replication_mode="3"`, then most operations initiated in Texas will take around one second to complete. This is obviously not acceptable for interactive workloads, but it might be fine if you are just doing back-ups, just increase the number of concurrent requests. - If you'd like data to be read and written primarily to/from the local servers in Texas, you can use `replication_mode="3-dangerous"`. This will reduce data durability slightly as a crash of your Texas server at the wrong time might cause some data to be lost. However most things will be extremely fast because all requests can be completed without contacting a server in Australia - If instead of having two sites in WA and one in Texas, you have two in Texas and one in WA, you can use `replication_mode="3"` and have acceptable latency as Garage only waits for one remote copy to be done (total quorum = 2) before returning, and in this case the copy to the other Texas DC is sufficient. Please try it out and tell us how things work out! Hope this helps :)

lx commented

2023-06-24 06:34:32 +00:00

Owner

Some notes on the 3-dangerous replication mode:

Data will only be commited to one node before Garage returns from write requests (other writes are done asynchronously), meaning that if that node crashes at the wrong time data is lost
Data will only be read from one node for read requests, so if it happens to read from the wrong node your data will not yet be there and you will have a 404. In practice, Garage will always priorize reading locally (the servers in Texas will read from the replica in Texas), and if it doesnt then that's a bug in Garage that you shold report. So you should always see your data when doing requests only to/from the Texas servers.

Some notes on the 3-dangerous replication mode: - Data will only be commited to one node before Garage returns from write requests (other writes are done asynchronously), meaning that if that node crashes at the wrong time data is lost - Data will only be read from one node for read requests, so if it happens to read from the wrong node your data will not yet be there and you will have a 404. In practice, Garage will always priorize reading locally (the servers in Texas will read from the replica in Texas), and if it doesnt then that's a bug in Garage that you shold report. So you should always see your data when doing requests only to/from the Texas servers.