garagehq.deuxfleurs.fr/content/blog/2022-ipfs/index.md

103 lines
7 KiB
Markdown
Raw Normal View History

2022-06-09 16:00:14 +00:00
+++
title="We tried IPFS over Garage"
date=2022-06-09
+++
Once you have spawned your Garage cluster, you might want to federate or share efficiently your content with the rest of the world.
In this blog post, we try to connect the InterPlanetary File System (IPFS) daemon to Garage.
We discuss the different bottleneck and limitations of the currently available software.
<!-- more -->
<!--Garage has been designed to be operated inside the same "administrative area", ie. operated by a single organization made of members that fully trust each other.
It is an intended design decision: trusting each other enables Garage to spread data over the machines instead of duplicating it.
Still, you might want to share and collaborate with the rest of the world, and it can be done in 2 ways with Garage: through the integrated HTTP server that can serve your bucket as a static website,
or by connecting it to an application that will act as a "proxy" between Garage and the rest of the world.
We refer as proxy software that know how to speak federated protocols (eg. Activity Pub, Solid, RemoteStorage, etc.) or distributed/p2p protocols (eg. Bittorrent, IPFS, etc.).-->
## Preliminary
People often struggle to see the difference between IPFS and Garage, so let's start by making clear that these projects are complementary and not interchangeable.
IPFS is a content-addressable network built in a peer-to-peer fashion.
With simple words, it means that you query the content you want with its identifier without having to know *where* it is hosted on the network, and especially on which machine.
As a side effect, you can share content over the Internet without any configuration (no firewall, NAT, fixed IP, DNS, etc.).
It has some nice benefits: if some content becomes very popular, all people that already accessed it can help serving it, and even if the original content provider goes offline, the content remains
availale in the network as long as one machine still have it.
However, IPFS does not enforce any property on the durability and availablity of your data: the collaboration mentioned earlier is
done only on a spontaneous approach. So at first, if you want to be sure that your content remains alive, you must keep it on your node.
And if nobody makes a copy of your content, you will loose it as soon as your node goes offline and/or crashes.
Furthermore, if you need multiple nodes to store your content, IPFS is not able to automatically place content on your nodes,
enforce a given replication amount, check the integrity of your content, and so on.
*Note: the IPFS project has another project named [IPFS Cluster](https://cluster.ipfs.io/) that addresses these problems.
We have not reviewed it yet but its 2 main differences with Garage are that 1) it does not expose an S3 API and 2) has a different consisenty model.
In the following, we suppose that these differences prevent you from deploying it in your infrastructure.*
➡️ **IPFS is designed to deliver content.**
Garage, on the contrary, is designed to spread automatically your content over all your available nodes to optimize your storage space. At the same time, it ensures that your content is always replicated exactly 3 times across the cluster (or less if you change a configuration parameter!) on different geographical zones (if possible). To access this content, you must have an API key, and have a correctly configured machine available over the network (including DNS/IP address/etc.). If the amount of traffic you receive is way larger than what your cluster can handle, your cluster will become simply unresponsive. Sharing content across people that do not trust each other, ie. who operate independant clusters, is not a feature of Garage: you have to rely on external software.
➡️ **Garage is designed to durably store content.**
In this blog post, we will explore if we can combine both properties by connecting an IPFS node to a Garage cluster.
## Try #1: Vanilla IPFS over Garage
<!--If you are not familiar with IPFS, is available both as a desktop app and a [CLI app](https://docs.ipfs.io/install/command-line/), in this post we will cover the CLI app as it is often easier to understand how things are working internally.
You can quickly follow the official [quick start guide](https://docs.ipfs.io/how-to/command-line-quick-start/#initialize-the-repository) to have an up and running node.-->
IPFS is available as a pre-compiled binary. But to connect it with Garage, we need a plugin named [ipfs/go-ds-s3](https://github.com/ipfs/go-ds-s3).
The Peergos project has a fork because it seems that the plugin is notorious for hitting Amazon's rate limits [#105](https://github.com/ipfs/go-ds-s3/issues/105), [#205](https://github.com/ipfs/go-ds-s3/pull/205).
This is the one we will try in the following.
The easiest solution to use this plugin in IPFS is to bundle it in the main IPFS daemon, and thus recompile IPFS from source.
Following the instructions on the README file allowed me to spawn an IPFS daemon configured with S3 as the block store.
Just be careful when adding the plugin to the `plugin/loader/preload_list` file, the given command lacks a newline.
You must edit the file manually after running it, you will directly see the problem and be able to fix it.
After that, I just ran the daemon and accessed the web interface to upload a photo of my dog:
![A dog](./dog.jpg)
The photo is assigned a content identifier (CID):
```
QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn
```
And it now accessible on the whole network.
You can inspect it [from the official gateway](https://explore.ipld.io/#/explore/QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn) for example:
![A screenshot of the IPFS explorer](./explorer.png)
At the same time, I was monitoring Garage (through [the OpenTelemetry stack we have implemented earlier this year](/blog/2022-v0-7-released/)).
Just after launching the daemon and before doing anything, we have this surprisingly active Grafana plot:
![Grafana API request rate when IPFS is idle](./idle.png)
It means that in average, we have around 250 requests per second, mainly to check that an IPFS block does not exist locally.
But when I start interacting with IPFS, values become so high that our default OpenTelemetry configuration can not cope with them.
We have the following error in Garage's logs:
```
OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full
```
It is useless to change our parameters to see what is exactly the number of requests done on the cluster: it is way too high, multiple orders of magnitude too high.
As a comparison, this whole webpage, with its pictures, triggers around 10 requests on Garage to load, and I expect seeing the same order of magnitude with IPFS.
I think we can conclude that this first try was a failure.
The S3 datastore on IPFS does too many request and would need some important work to optimize it.
But we should not give up too fast, because Peergos folks are known to run their software based on IPFS, in production, with an S3 backend.
## Try #2: Peergos over Garage