More improvements to ipfs article

This commit is contained in:
Alex 2022-06-15 20:19:58 +02:00
parent 1458ef0eca
commit c60063c180
No known key found for this signature in database
GPG key ID: 09EC5284AA804D3C

View file

@ -48,7 +48,7 @@ are in charge of storing the first half of the archive while Charlie and Eve are
[Resilio](https://www.resilio.com/individuals/) and [Syncthing](https://syncthing.net/) both feature protocols inspired by BitTorrent to synchronize a tree of your file system between multiple computers. [Resilio](https://www.resilio.com/individuals/) and [Syncthing](https://syncthing.net/) both feature protocols inspired by BitTorrent to synchronize a tree of your file system between multiple computers.
Reviewing these solutions is out of the scope of this article, feel free to try them by yourself!* Reviewing these solutions is out of the scope of this article, feel free to try them by yourself!*
Garage, on the contrary, is designed to spread automatically your content over all your available nodes, in a manner that makes the best use possible of your storage space. Garage, on the contrary, is designed to spread automatically your content over all your available nodes, in a manner that makes the best possible use of your storage space.
At the same time, it ensures that your content is always replicated exactly 3 times across the cluster (or less if you change a configuration parameter), At the same time, it ensures that your content is always replicated exactly 3 times across the cluster (or less if you change a configuration parameter),
on different geographical zones when possible. on different geographical zones when possible.
<!--To access this content, you must have an API key, and have a correctly configured machine available over the network (including DNS/IP address/etc.). If the amount of traffic you receive is way larger than what your cluster can handle, your cluster will become simply unresponsive. Sharing content across people that do not trust each other, ie. who operate independant clusters, is not a feature of Garage: you have to rely on external software.--> <!--To access this content, you must have an API key, and have a correctly configured machine available over the network (including DNS/IP address/etc.). If the amount of traffic you receive is way larger than what your cluster can handle, your cluster will become simply unresponsive. Sharing content across people that do not trust each other, ie. who operate independant clusters, is not a feature of Garage: you have to rely on external software.-->
@ -79,9 +79,7 @@ I had to edit the file manually after running it, the issue was directly visible
After that, I just ran the daemon and accessed the web interface to upload a photo of my dog: After that, I just ran the daemon and accessed the web interface to upload a photo of my dog:
<center>
![A dog](./dog.jpg) ![A dog](./dog.jpg)
</center>
A content identifier (CID) was assigned to this picture: A content identifier (CID) was assigned to this picture:
@ -90,27 +88,24 @@ QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn
``` ```
The photo it now accessible on the whole network. The photo it now accessible on the whole network.
You can inspect it [from the official gateway](https://explore.ipld.io/#/explore/QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn) for example: For example you can inspect it [from the official gateway](https://explore.ipld.io/#/explore/QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn):
![A screenshot of the IPFS explorer](./explorer.png) ![A screenshot of the IPFS explorer](./explorer.png)
At the same time, I was monitoring Garage (through [the OpenTelemetry stack we have implemented earlier this year](/blog/2022-v0-7-released/)). At the same time, I was monitoring Garage (through [the OpenTelemetry stack we have implemented earlier this year](/blog/2022-v0-7-released/)).
Just after launching the daemon and before doing anything, we had this surprisingly active Grafana plot: Just after launching the daemon and before doing anything, we had this surprisingly active Grafana plot:
<center> ![Grafana API request rate when IPFS is idle](./idle.png)
![Grafana API request rate when IPFS is idle](./idle.png) <center><i>Legend: y axis = requests per 10 seconds, x axis = time</i></center><p></p>
<p>*Legend: y axis = requests per 10 seconds, x axis = time*</p>
</center>
It means that on average, we have around 250 requests per second. Most of these requests are checks that an IPFS block does not exist locally. It means that on average, we have around 250 requests per second. Most of these requests are checks that an IPFS block does not exist locally.
These requests are triggered by the DHT service of IPFS: since my node is reachable over the Internet, it acts as a public DHT server and start answering global These requests are triggered by the DHT service of IPFS: since my node is reachable over the Internet, it acts as a public DHT server and has to answer global
block requests over the whole network. Each time it receives a request for a block, it sends a request to its storage back-end to see if it exists block requests over the whole network. Each time it receives a request for a block, it sends a request to its storage back-end (in our case, to Garage) to see if it exists.
(in our case, to Garage).
*We will try to tweak the IPFS configuration later - we know that we can deactivate the DHT server. For now, we will continue with the default parameters.* *We will try to tweak the IPFS configuration later - we know that we can deactivate the DHT server. For now, we will continue with the default parameters.*
When I start interacting with IPFS by sending a file or browsing the default proposed catalogs (ie. the full XKCD archive), When I start interacting with IPFS by sending a file or browsing the default proposed catalogs (i.e. the full XKCD archive),
we hit limits with our monitoring stack which, in its default configuration, is not able to ingest the traces of I hit limits with our monitoring stack which, in its default configuration, is not able to ingest the traces of
so many requests being processed by Garage. so many requests being processed by Garage.
We have the following error in Garage's logs: We have the following error in Garage's logs:
@ -118,53 +113,50 @@ We have the following error in Garage's logs:
OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full
``` ```
At this point, I didn't feel that it would be very interesting to fix this issue to see what is exactly the number of requests done on the cluster. At this point, I didn't feel that it would be very interesting to fix this issue to see what was exactly the number of requests done on the cluster.
In my opinion, such a simple task of sharing a picture should not require so many requests to the storage server. In my opinion, such a simple task of sharing a picture should not require so many requests to the storage server anyway.
As a comparison, this whole webpage, with its pictures, triggers around 10 requests on Garage when loaded, not thousands. As a comparison, this whole webpage, with its pictures, triggers around 10 requests on Garage when loaded, not thousands.
I think we can conclude that this first try was a failure. I think we can conclude that this first try was a failure.
The S3 storage plugin for IPFS does too many request and would need some important work to optimize it. The S3 storage plugin for IPFS does too many request and would need some important work to be optimized.
But we should not give up too fast, because the Peergos folks are known to run their software based on IPFS, in production, with an S3 backend. However, we should not give up too fast, because the people behind Peergos are known to run their software based on IPFS in production with an S3 backend.
## Try #2: Peergos over Garage ## Try #2: Peergos over Garage
[Peergos](https://peergos.org/) is designed as an end-to-end encrypted and federated alternative to Nextcloud. [Peergos](https://peergos.org/) is designed as an end-to-end encrypted and federated alternative to Nextcloud.
Internally, it is built upon IPFS and is known to have a [deep integration with the S3 API](https://peergos.org/posts/direct-s3). Internally, it is built on IPFS and is known to have a [deep integration with the S3 API](https://peergos.org/posts/direct-s3).
One important point of this integration is that your browser is able to bypass both the Peergos daemon and the IPFS daemon One important point of this integration is that your browser is able to bypass both the Peergos daemon and the IPFS daemon
to write and read IPFS blocks directly from the S3 API server. to write and read IPFS blocks directly from the S3 API server.
*I don't know exactly if Peergos is still considered as alpha quality, or if a beta version was released, *I don't know exactly if Peergos is still considered as alpha quality, or if a beta version was released,
but keep in mind that it might be more experimental that you would like!* but keep in mind that it might be more experimental that you'd like!*
<!--To give ourselves some courage in this adventure, let's start with a nice screenshot of their web UI: <!--To give ourselves some courage in this adventure, let's start with a nice screenshot of their web UI:
![Peergos Web UI](./peergos.jpg)--> ![Peergos Web UI](./peergos.jpg)-->
Starting Peergos on top of Garage required some small patches on both sides, Starting Peergos on top of Garage required some small patches on both sides, but in the end, I was able to get it working.
but in the end, we were able to get it working. I was able to upload my file, see it in the interface, create a link to share it, rename it, move it in a folder, and so on:
I was able to upload my file, see it in the interface, create a link to share it,
rename it, move it in a folder, and so on:
![A screenshot of the Peergos interface](./upload.png) ![A screenshot of the Peergos interface](./upload.png)
At the same time, the fans of my computer started to become a bit loud. At the same time, the fans of my computer started to become a bit loud!
A quick look at Grafana shows that Garage is still very busy: A quick look at Grafana shows that Garage is still very busy:
<center> ![Screenshot of a grafana plot showing requests per second over time](./grafa.png)
![Screenshot of a grafana plot showing requests per second over time](./grafa.png) <center><i>Legend: y axis = requests per 10 seconds on log(10) scale, x axis = time</i></center><p></p>
<p>*Legend: y axis = requests per 10 seconds on log(10) scale, x axis = time*</p>
</center>
Again, the workload is dominated by the `HeadObject` requests. Again, the workload is dominated by `HeadObject` requests.
After getting a look at `~/.peergos/.ipfs/config`, it seems that the IPFS configuration used by the Peergos project is quite standard, After taking a look at `~/.peergos/.ipfs/config`, it seems that the IPFS configuration used by the Peergos project is quite standard,
which means that, same as before, we are acting as a DHT server and having to answer to thousands of block requests every second. which means that, as before, we are acting as a DHT server and having to answer to thousands of block requests every second.
We also have some traffic on the `GetObject` and `OPTIONS` endpoints (with peaks up to ~45 req/sec). We also have some traffic on the `GetObject` and `OPTIONS` endpoints (with peaks up to ~45 req/sec).
This traffic is all generated by Peergos. This traffic is all generated by Peergos.
The `OPTIONS` HTTP verb is here because we use the direct access feature of Peergos, The `OPTIONS` HTTP verb is here because we use the direct access feature of Peergos,
meaning that our browser is talking directly to Garage and has to use CORS to validate requests for security. meaning that our browser is talking directly to Garage and has to use CORS to validate requests for security.
Internally, IPFS splits files in blocks of less than 256 kB. My picture is thus split in 2 blocks, requiring 2 requests over Garage to fetch it. Internally, IPFS splits files in blocks of less than 256 kB. My picture is thus split in 2 blocks, requiring 2 requests over Garage to fetch it.
But even by knowing that IPFS split files in small blocks, I can't explain why we have so much `GetObject` requests. But even by knowing that IPFS split files in small blocks, I can't explain why we have so many `GetObject` requests.
## Try #3: Optimizing IPFS ## Try #3: Optimizing IPFS
@ -174,36 +166,36 @@ Routing = dhtclient
--> -->
We have seen in our 2 previous tries that the main source of load was the federation, and more especially, the DHT server. We have seen in our 2 previous tries that the main source of load was the federation, and more especially, the DHT server.
In this section, we want to artificially remove this problem from the equation by preventing our IPFS node from federating In this section, we'd like to artificially remove this problem from the equation by preventing our IPFS node from federating
and see what pressure is put by Peergos alone on our local cluster. and see what pressure is put by Peergos alone on our local cluster.
To isolate IPFS, we have set its routing type to `none`, we have cleared its bootstrap node list, To isolate IPFS, I have set its routing type to `none`, I have cleared its bootstrap node list,
and we configured the swarm socket to listen only on localhost. and I configured the swarm socket to listen only on `localhost`.
Finally, I restarted Peergos and was able to observe this more peaceful graph:
Finally, we restart Peergos and observe this more peaceful graph: ![Screenshot of a grafana plot showing requests per second over time](./grafa3.png)
<center><i>Legend: y axis = requests per 10 seconds on log(10) scale, x axis = time</i></center><p></p>
![Screenshot of a grafana plot showing requests per second over time](./grafa3.png)
*Legend: y axis = requests per 10 seconds on log(10) scale, x axis = time*
Now, for a given endpoint, we have peaks of around 10 req/sec which is way more reasonable. Now, for a given endpoint, we have peaks of around 10 req/sec which is way more reasonable.
Furthermore, we are no longer hammering our back-end with requests on objects that are not there. Furthermore, we are no longer hammering our back-end with requests on objects that are not there.
After discussing with the developers, it is possible to go even further by running Peergos without IPFS: After discussing with the developers, it is possible to go even further by running Peergos without IPFS:
this is what they do for some of their tests. At the same time, if you increase the size this is what they do for some of their tests. If at the same time we increased the size of data blocks,
of a block, you might have a non-federated but efficient end-to-end encrypted "cloud storage" that works well over Garage, we might have a non-federated but quite efficient end-to-end encrypted "cloud storage" that works well over Garage,
with your clients directly hitting the S3 API! with our clients directly hitting the S3 API!
If federation is a hard requirement for your setup, the next step would be to gradually allow our node to connect to the IPFS network, For setups where federation is a hard requirement,
while ensuring that the traffic to the S3 cluster remains low. the next step would be to gradually allow our node to connect to the IPFS network,
For example, configuring our IPFS node as a `dhtclient` instead of `dhtserver` would exempt it from answering public DHT requests. while ensuring that the traffic to the Garage cluster remains low.
Keeping an in-memory index (as a hash map and/or Bloom filter) of the blocks stored on the current node For example, configuring our IPFS node as a `dhtclient` instead of a `dhtserver` would exempt it from answering public DHT requests.
Keeping an in-memory index (as a hash map and/or a Bloom filter) of the blocks stored on the current node
could also drastically reduce the number of requests. could also drastically reduce the number of requests.
It could also be interesting to explore ways to run in one process a full IPFS node with a DHT It could also be interesting to explore ways to run in one process a full IPFS node with a DHT
server on the regular file system, and reserve a second process configured with the S3 back-end to handle only our Peergos data. server on the regular file system, and reserve a second process configured with the S3 back-end to handle only our Peergos data.
However, with these optimizations, the best we can expect is the traffic we have on the previous plot. However, even with these optimizations, the best we can expect is the traffic we have on the previous plot.
From a theoretical perspective, it is still higher than the optimal number of requests. From a theoretical perspective, it is still higher than the optimal number of requests.
On S3, storing a file, downloading a file, listing available files are all actions that can be done in a single request. On S3, storing a file, downloading a file and listing available files are all actions that can be done in a single request.
Even if all requests don't have the same cost on the cluster, processing a request has a non-negligible fixed cost. Even if all requests don't have the same cost on the cluster, processing a request has a non-negligible fixed cost.
## S3 and IPFS are incompatible? ## S3 and IPFS are incompatible?
@ -212,22 +204,23 @@ Even if all requests don't have the same cost on the cluster, processing a reque
## Conclusion ## Conclusion
Running IPFS over an S3 storage back-end does not quite work out of the box in term of performances in the current state of affairs. Running IPFS over an S3 storage back-end does not quite work out of the box in term of performances.
We have identified that the main problem is linked with the DHT service, We have identified that the main problem is linked with the DHT service,
and proposed some improvements (disabling the DHT server, keeping an in-memory index of the blocks, using the S3 back-end only for your data). and proposed some improvements (disabling the DHT server, keeping an in-memory index of the blocks, using the S3 back-end only for user data).
It is possible to modify Peergos to make it work without IPFS. With some optimization on the block size, It is possible to modify Peergos to make it work without IPFS.
you might have a great proof of concept of an end-to-end encrypted "cloud storage" over Garage. With some optimizations on the block size,
we might have a great proof of concept of an end-to-end encrypted "cloud storage" over Garage.
*If you happen to be working on this, please inform us!* *If you happen to be working on this, please inform us!*
From an IPFS design perspective, it seems however that the numerous small blocks handled by the protocol From an IPFS design perspective, it seems however that the numerous small blocks handled by the protocol
do not map trivially to efficient S3 requests, and thus could be a limiting factor to any optimization work. do not map trivially to efficient use of the S3 API, and thus could be a limiting factor to any optimization work.
As part of our test journey, we also read some posts about performance issues on IPFS (eg. [#6283](https://github.com/ipfs/go-ipfs/issues/6283)) that are not As part of my testing journey, I also stumbled upon some posts about performance issues on IPFS (eg. [#6283](https://github.com/ipfs/go-ipfs/issues/6283))
linked with the S3 connector. We might be negatively influenced by our failure to connect IPFS with S3, that are not linked with the S3 connector. I might be negatively influenced by my failure to connect IPFS with S3,
but we are tempted to think that IPFS is intrinsically resource-intensive. but at this point I'm tempted to think that IPFS is intrinsically resource-intensive.
On our side, we will continue our investigations towards more *minimalistic* software. On our side at Deuxfleurs, we will continue our investigations towards more *minimalistic* software.
This choice makes sense for us as we want to reduce the ecological impact of our services This choice makes sense for us as we want to reduce the ecological impact of our services
by deploying less servers, that use less energy, and that are renewed less frequently. by deploying less servers, that use less energy, and that are renewed less frequently.