Manual fixes
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing

This commit is contained in:
Maximilien 2022-07-07 19:37:27 +02:00
parent 2cfc65befc
commit 70a9a6c79d

View file

@ -50,18 +50,18 @@ are in charge of storing the first half of the archive while Charlie and Eve are
[Resilio](https://www.resilio.com/individuals/) and [Syncthing](https://syncthing.net/) both feature protocols inspired by BitTorrent to synchronize a tree of your file system between multiple computers.
Reviewing these solutions is out of the scope of this article, feel free to try them by yourself!*
Garage, on the contrary, is designed to automatically spread your content over all your available nodes, in a manner that makes the best possible use of your storage space.
Garage, on the other hand, is designed to automatically spread your content over all your available nodes, in a manner that makes the best possible use of your storage space.
At the same time, it ensures that your content is always replicated exactly 3 times across the cluster (or less if you change a configuration parameter),
on different geographical zones when possible.
<!--To access this content, you must have an API key, and have a correctly configured machine available over the network (including DNS/IP address/etc.). If the amount of traffic you receive is way larger than what your cluster can handle, your cluster will become simply unresponsive. Sharing content across people that do not trust each other, ie. who operate independent clusters, is not a feature of Garage: you have to rely on external software.-->
However, this means that when content is requested from a Garage cluster, there are only 3 nodes that are capable of returning it to the user.
As a consequence, when content becomes popular, these nodes might become a bottleneck.
Moreover, all resources created (keys, files, buckets) are tightly coupled to the Garage cluster on which they exist;
However, this means that when content is requested from a Garage cluster, there are only 3 nodes capable of returning it to the user.
As a consequence, when content becomes popular, this subset of nodes might become a bottleneck.
Moreover, all resources (keys, files, buckets) are tightly coupled to the Garage cluster on which they exist;
servers from different clusters can't collaborate to serve together the same data (without additional software).
➡️ **Garage is designed to durably store content.**
In this blog post, we will explore whether we can combine both properties by connecting an IPFS node to a Garage cluster.
In this blog post, we will explore whether we can combine delivary and durability by connecting an IPFS node to a Garage cluster.
## Try #1: Vanilla IPFS over Garage
@ -73,7 +73,7 @@ The Peergos project has a fork because it seems that the plugin is known for hit
([#105](https://github.com/ipfs/go-ds-s3/issues/105), [#205](https://github.com/ipfs/go-ds-s3/pull/205)).
This is the one we will try in the following.
The easiest solution to use this plugin in IPFS is to bundle it in the main IPFS daemon, and thus recompile IPFS from sources.
The easiest solution to use this plugin in IPFS is to bundle it in the main IPFS daemon, and recompile IPFS from sources.
Following the instructions on the README file allowed me to spawn an IPFS daemon configured with S3 as the block store.
I had a small issue when adding the plugin to the `plugin/loader/preload_list` file: the given command lacks a newline.
@ -95,20 +95,19 @@ For example, you can inspect it [from the official gateway](https://explore.ipld
![A screenshot of the IPFS explorer](./explorer.png)
At the same time, I was monitoring Garage (through [the OpenTelemetry stack we implemented earlier this year](/blog/2022-v0-7-released/)).
Just after launching the daemon and before doing anything, we had this surprisingly active Grafana plot:
Just after launching the daemon - and before doing anything - I was met by this surprisingly active Grafana plot:
![Grafana API request rate when IPFS is idle](./idle.png)
<center><i>Legend: y axis = requests per 10 seconds, x axis = time</i></center><p></p>
It means that on average, we have around 250 requests per second. Most of these requests are checks that an IPFS block does not exist locally.
These requests are triggered by the DHT service of IPFS: since my node is reachable over the Internet, it acts as a public DHT server and has to answer global
block requests over the whole network. Each time it receives a request for a block, it sends a request to its storage back-end (in our case, to Garage) to see if it exists.
It shows that on average, we handle around 250 requests per second. Most of these requests are in fact the IPFS daemon checking if a block exists in Gargage.
These requests are triggered by IPFS's DHT service: since my node is reachable over the Internet, it acts as a public DHT server and has to answer global
block requests over the whole network. Each time it receives a request for a block, it sends a request to its storage back-end (in our case, to Garage) to see if a copy exists locally.
*We will try to tweak the IPFS configuration later - we know that we can deactivate the DHT server. For now, we will continue with the default parameters.*
When I start interacting with IPFS by sending a file or browsing the default proposed catalogs (i.e. the full XKCD archive),
I hit limits with our monitoring stack which, in its default configuration, is not able to ingest the traces of
so many requests being processed by Garage.
When I started interacting with the IPFS node by sending a file or browsing the default proposed catalogs (i.e. the full XKCD archive),
I quickly hit limits with our monitoring stack which, in its default configuration, is not able to ingest the large amount of tracing data produced by the high number of S3 requests originating from the IPFS daemon.
We have the following error in Garage's logs:
```
@ -149,7 +148,7 @@ A quick look at Grafana showed again a very active Garage:
![Screenshot of a grafana plot showing requests per second over time](./grafa.png)
<center><i>Legend: y axis = requests per 10 seconds on log(10) scale, x axis = time</i></center><p></p>
Again, the workload is dominated by `HeadObject` requests.
Again, the workload is dominated by S3 `HeadObject` requests.
After taking a look at `~/.peergos/.ipfs/config`, it seems that the IPFS configuration used by the Peergos project is quite standard,
which means that, as before, we are acting as a DHT server and having to answer to thousands of block requests every second.