garagehq.deuxfleurs.fr/content/blog/2022-ipfs/index.md
Quentin 1bbe3f9c58
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
IPFS article: preliminary+vanilla IPFS
2022-06-09 18:00:14 +02:00

7 KiB

+++ title="We tried IPFS over Garage" date=2022-06-09 +++

Once you have spawned your Garage cluster, you might want to federate or share efficiently your content with the rest of the world. In this blog post, we try to connect the InterPlanetary File System (IPFS) daemon to Garage. We discuss the different bottleneck and limitations of the currently available software.

Preliminary

People often struggle to see the difference between IPFS and Garage, so let's start by making clear that these projects are complementary and not interchangeable. IPFS is a content-addressable network built in a peer-to-peer fashion. With simple words, it means that you query the content you want with its identifier without having to know where it is hosted on the network, and especially on which machine. As a side effect, you can share content over the Internet without any configuration (no firewall, NAT, fixed IP, DNS, etc.). It has some nice benefits: if some content becomes very popular, all people that already accessed it can help serving it, and even if the original content provider goes offline, the content remains availale in the network as long as one machine still have it.

However, IPFS does not enforce any property on the durability and availablity of your data: the collaboration mentioned earlier is done only on a spontaneous approach. So at first, if you want to be sure that your content remains alive, you must keep it on your node. And if nobody makes a copy of your content, you will loose it as soon as your node goes offline and/or crashes. Furthermore, if you need multiple nodes to store your content, IPFS is not able to automatically place content on your nodes, enforce a given replication amount, check the integrity of your content, and so on.

Note: the IPFS project has another project named IPFS Cluster that addresses these problems. We have not reviewed it yet but its 2 main differences with Garage are that 1) it does not expose an S3 API and 2) has a different consisenty model. In the following, we suppose that these differences prevent you from deploying it in your infrastructure.

➡️ IPFS is designed to deliver content.

Garage, on the contrary, is designed to spread automatically your content over all your available nodes to optimize your storage space. At the same time, it ensures that your content is always replicated exactly 3 times across the cluster (or less if you change a configuration parameter!) on different geographical zones (if possible). To access this content, you must have an API key, and have a correctly configured machine available over the network (including DNS/IP address/etc.). If the amount of traffic you receive is way larger than what your cluster can handle, your cluster will become simply unresponsive. Sharing content across people that do not trust each other, ie. who operate independant clusters, is not a feature of Garage: you have to rely on external software.

➡️ Garage is designed to durably store content.

In this blog post, we will explore if we can combine both properties by connecting an IPFS node to a Garage cluster.

Try #1: Vanilla IPFS over Garage

IPFS is available as a pre-compiled binary. But to connect it with Garage, we need a plugin named ipfs/go-ds-s3. The Peergos project has a fork because it seems that the plugin is notorious for hitting Amazon's rate limits #105, #205. This is the one we will try in the following.

The easiest solution to use this plugin in IPFS is to bundle it in the main IPFS daemon, and thus recompile IPFS from source. Following the instructions on the README file allowed me to spawn an IPFS daemon configured with S3 as the block store.

Just be careful when adding the plugin to the plugin/loader/preload_list file, the given command lacks a newline. You must edit the file manually after running it, you will directly see the problem and be able to fix it.

After that, I just ran the daemon and accessed the web interface to upload a photo of my dog:

A dog

The photo is assigned a content identifier (CID):

QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn

And it now accessible on the whole network. You can inspect it from the official gateway for example:

A screenshot of the IPFS explorer

At the same time, I was monitoring Garage (through the OpenTelemetry stack we have implemented earlier this year). Just after launching the daemon and before doing anything, we have this surprisingly active Grafana plot:

Grafana API request rate when IPFS is idle

It means that in average, we have around 250 requests per second, mainly to check that an IPFS block does not exist locally. But when I start interacting with IPFS, values become so high that our default OpenTelemetry configuration can not cope with them. We have the following error in Garage's logs:

OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full

It is useless to change our parameters to see what is exactly the number of requests done on the cluster: it is way too high, multiple orders of magnitude too high. As a comparison, this whole webpage, with its pictures, triggers around 10 requests on Garage to load, and I expect seeing the same order of magnitude with IPFS.

I think we can conclude that this first try was a failure. The S3 datastore on IPFS does too many request and would need some important work to optimize it. But we should not give up too fast, because Peergos folks are known to run their software based on IPFS, in production, with an S3 backend.

Try #2: Peergos over Garage