diff --git a/content/blog/2022-ipfs/dog.jpg b/content/blog/2022-ipfs/dog.jpg new file mode 100644 index 0000000..704837f Binary files /dev/null and b/content/blog/2022-ipfs/dog.jpg differ diff --git a/content/blog/2022-ipfs/explorer.png b/content/blog/2022-ipfs/explorer.png new file mode 100644 index 0000000..4639593 Binary files /dev/null and b/content/blog/2022-ipfs/explorer.png differ diff --git a/content/blog/2022-ipfs/grafa.png b/content/blog/2022-ipfs/grafa.png new file mode 100644 index 0000000..4934105 Binary files /dev/null and b/content/blog/2022-ipfs/grafa.png differ diff --git a/content/blog/2022-ipfs/grafa2.png b/content/blog/2022-ipfs/grafa2.png new file mode 100644 index 0000000..e056028 Binary files /dev/null and b/content/blog/2022-ipfs/grafa2.png differ diff --git a/content/blog/2022-ipfs/grafa3.png b/content/blog/2022-ipfs/grafa3.png new file mode 100644 index 0000000..1b29b60 Binary files /dev/null and b/content/blog/2022-ipfs/grafa3.png differ diff --git a/content/blog/2022-ipfs/idle.png b/content/blog/2022-ipfs/idle.png new file mode 100644 index 0000000..c8e745e Binary files /dev/null and b/content/blog/2022-ipfs/idle.png differ diff --git a/content/blog/2022-ipfs/index.md b/content/blog/2022-ipfs/index.md new file mode 100644 index 0000000..ec1649a --- /dev/null +++ b/content/blog/2022-ipfs/index.md @@ -0,0 +1,268 @@ ++++ +title="We tried IPFS over Garage" +date=2022-06-09 ++++ + + +*Once you have spawned your Garage cluster, you might be interested in finding ways to share efficiently your content with the rest of the world, +such as by joining federated platforms. +In this blog post, we experiment with interconnecting the InterPlanetary File System (IPFS) daemon with Garage. +We discuss the different bottlenecks and limitations of the software stack in its current state.* + + + +--- + + + + +## Some context + +People often struggle to see the difference between IPFS and Garage, so let's start by making clear that these projects are complementary and not interchangeable. + +Personally, I see IPFS as the intersection between BitTorrent and a file system. BitTorrent remains to this day one of the most efficient ways to deliver +a copy of a file or a folder to a very large number of destinations. It however lacks some form of interactivity: once a torrent file has been generated, you can't simply +add or remove files from it. By presenting itself more like a file system, IPFS is able to handle this use case out-of-the-box. + + + + + +However, you would probably not rely on BitTorrent to durably store the encrypted holiday pictures you shared with your friends, +as content on the BitTorrent tends to vanish when no one in the network has a copy of it anymore. The same applies to IPFS. +Even if at some time everyone has a copy of the pictures on their hard disk, people might delete these copies after a while without you knowing it. +You also can't easily collaborate on storing this common treasure. For example, there is no automatic way to say that Alice and Bob +are in charge of storing the first half of the archive while Charlie and Eve are in charge of the second half. + +➡️ **IPFS is designed to deliver content.** + +*Note: the IPFS project has another project named [IPFS Cluster](https://cluster.ipfs.io/) that allows servers to collaborate on hosting IPFS content. +[Resilio](https://www.resilio.com/individuals/) and [Syncthing](https://syncthing.net/) both feature protocols inspired by BitTorrent to synchronize a tree of your file system between multiple computers. +Reviewing these solutions is out of the scope of this article, feel free to try them by yourself!* + +Garage, on the contrary, is designed to automatically spread your content over all your available nodes, in a manner that makes the best possible use of your storage space. +At the same time, it ensures that your content is always replicated exactly 3 times across the cluster (or less if you change a configuration parameter), +on different geographical zones when possible. + +However, this means that when content is requested from a Garage cluster, there are only 3 nodes that are capable of returning it to the user. +As a consequence, when content becomes popular, these nodes might become a bottleneck. +Moreover, all resources created (keys, files, buckets) are tightly coupled to the Garage cluster on which they exist; +servers from different clusters can't collaborate to serve together the same data (without additional software). + +➡️ **Garage is designed to durably store content.** + +In this blog post, we will explore whether we can combine both properties by connecting an IPFS node to a Garage cluster. + +## Try #1: Vanilla IPFS over Garage + + + +IPFS is available as a pre-compiled binary, but to connect it with Garage, we need a plugin named [ipfs/go-ds-s3](https://github.com/ipfs/go-ds-s3). +The Peergos project has a fork because it seems that the plugin is known for hitting Amazon's rate limits +([#105](https://github.com/ipfs/go-ds-s3/issues/105), [#205](https://github.com/ipfs/go-ds-s3/pull/205)). +This is the one we will try in the following. + +The easiest solution to use this plugin in IPFS is to bundle it in the main IPFS daemon, and thus recompile IPFS from source. +Following the instructions on the README file allowed me to spawn an IPFS daemon configured with S3 as the block store. + +I had a small issue when adding the plugin to the `plugin/loader/preload_list` file: the given command lacks a newline. +I had to edit the file manually after running it, the issue was directly visible and easy to fix. + +After that, I just ran the daemon and accessed the web interface to upload a photo of my dog: + +![A dog](./dog.jpg) + +A content identifier (CID) was assigned to this picture: + +``` +QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn +``` + +The photo it now accessible on the whole network. +For example you can inspect it [from the official gateway](https://explore.ipld.io/#/explore/QmNt7NSzyGkJ5K9QzyceDXd18PbLKrMAE93XuSC2487EFn): + +![A screenshot of the IPFS explorer](./explorer.png) + +At the same time, I was monitoring Garage (through [the OpenTelemetry stack we have implemented earlier this year](/blog/2022-v0-7-released/)). +Just after launching the daemon and before doing anything, we had this surprisingly active Grafana plot: + +![Grafana API request rate when IPFS is idle](./idle.png) +
Legend: y axis = requests per 10 seconds, x axis = time

+ +It means that on average, we have around 250 requests per second. Most of these requests are checks that an IPFS block does not exist locally. +These requests are triggered by the DHT service of IPFS: since my node is reachable over the Internet, it acts as a public DHT server and has to answer global +block requests over the whole network. Each time it receives a request for a block, it sends a request to its storage back-end (in our case, to Garage) to see if it exists. + +*We will try to tweak the IPFS configuration later - we know that we can deactivate the DHT server. For now, we will continue with the default parameters.* + +When I start interacting with IPFS by sending a file or browsing the default proposed catalogs (i.e. the full XKCD archive), +I hit limits with our monitoring stack which, in its default configuration, is not able to ingest the traces of +so many requests being processed by Garage. +We have the following error in Garage's logs: + +``` +OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is full +``` + +At this point, I didn't feel that it would be very interesting to fix this issue to see what was exactly the number of requests done on the cluster. +In my opinion, such a simple task of sharing a picture should not require so many requests to the storage server anyway. +As a comparison, this whole webpage, with its pictures, triggers around 10 requests on Garage when loaded, not thousands. + +I think we can conclude that this first try was a failure. +The S3 storage plugin for IPFS does too many request and would need some important work to be optimized. +However, we are aware that the people behind Peergos are known to run their software based on IPFS in production with an S3 backend, +so we should not give up too fast. + +## Try #2: Peergos over Garage + +[Peergos](https://peergos.org/) is designed as an end-to-end encrypted and federated alternative to Nextcloud. +Internally, it is built on IPFS and is known to have a [deep integration with the S3 API](https://peergos.org/posts/direct-s3). +One important point of this integration is that your browser is able to bypass both the Peergos daemon and the IPFS daemon +to write and read IPFS blocks directly from the S3 API server. + +*I don't know exactly if Peergos is still considered as alpha quality, or if a beta version was released, +but keep in mind that it might be more experimental that you'd like!* + + + +Starting Peergos on top of Garage required some small patches on both sides, but in the end, I was able to get it working. +I was able to upload my file, see it in the interface, create a link to share it, rename it, move it in a folder, and so on: + +![A screenshot of the Peergos interface](./upload.png) + +At the same time, the fans of my computer started to become a bit loud! +A quick look at Grafana showed again a very active Garage: + +![Screenshot of a grafana plot showing requests per second over time](./grafa.png) +
Legend: y axis = requests per 10 seconds on log(10) scale, x axis = time

+ +Again, the workload is dominated by `HeadObject` requests. +After taking a look at `~/.peergos/.ipfs/config`, it seems that the IPFS configuration used by the Peergos project is quite standard, +which means that, as before, we are acting as a DHT server and having to answer to thousands of block requests every second. + +We also have some traffic on the `GetObject` and `OPTIONS` endpoints (with peaks up to ~45 req/sec). +This traffic is all generated by Peergos. +The `OPTIONS` HTTP verb is here because we use the direct access feature of Peergos, +meaning that our browser is talking directly to Garage and has to use CORS to validate requests for security. + +Internally, IPFS splits files in blocks of less than 256 kB. My picture is thus split in 2 blocks, requiring 2 requests over Garage to fetch it. +But even knowing that IPFS splits files in small blocks, I can't explain why we have so many `GetObject` requests. + +## Try #3: Optimizing IPFS + + + +We have seen in our 2 previous tries that the main source of load was the federation, and in particular the DHT server. +In this section, we'd like to artificially remove this problem from the equation by preventing our IPFS node from federating +and see what pressure is put by Peergos alone on our local cluster. + +To isolate IPFS, I have set its routing type to `none`, I have cleared its bootstrap node list, +and I configured the swarm socket to listen only on `localhost`. +Finally, I restarted Peergos and was able to observe this more peaceful graph: + +![Screenshot of a grafana plot showing requests per second over time](./grafa3.png) +
Legend: y axis = requests per 10 seconds on log(10) scale, x axis = time

+ +Now, for a given endpoint, we have peaks of around 10 req/sec which is way more reasonable. +Furthermore, we are no longer hammering our back-end with requests on objects that are not there. + +After discussing with the developers, it is possible to go even further by running Peergos without IPFS: +this is what they do for some of their tests. If at the same time we increased the size of data blocks, +we might have a non-federated but quite efficient end-to-end encrypted "cloud storage" that works well over Garage, +with our clients directly hitting the S3 API! + +For setups where federation is a hard requirement, +the next step would be to gradually allow our node to connect to the IPFS network, +while ensuring that the traffic to the Garage cluster remains low. +For example, configuring our IPFS node as a `dhtclient` instead of a `dhtserver` would exempt it from answering public DHT requests. +Keeping an in-memory index (as a hash map and/or a Bloom filter) of the blocks stored on the current node +could also drastically reduce the number of requests. +It could also be interesting to explore ways to run in one process a full IPFS node with a DHT +server on the regular file system, and reserve a second process configured with the S3 back-end to handle only our Peergos data. + +However, even with these optimizations, the best we can expect is the traffic we have on the previous plot. +From a theoretical perspective, it is still higher than the optimal number of requests. +On S3, storing a file, downloading a file and listing available files are all actions that can be done in a single request. +Even if all requests don't have the same cost on the cluster, processing a request has a non-negligible fixed cost. + +## Are S3 and IPFS incompatible? + +Tweaking IPFS in order to try and make it work on an S3 backend is all and good, +but in some sense, the assumptions made by IPFS are funamentally incompatible with using S3 as a block storage. + +First, data on IPFS is split in relatively small chunks: all IPFS blocks must be less than 1 MB, with most being 256 KB or less. +This means that large files or complex directory hierarchies will need thousands of blocks to be stored, +each of which is mapped to a single object in the S3 storage back-end. +On the other side, S3 implementations such as Garage are made to handle very large objects efficiently, +and they also provide their own primitives for rapidly listing all the objects present in a bucket or a directory. +There is thus a huge loss in performance when data is stored in IPFS's block format, because this format does not +take advantage of the optimizations provided by S3 back-ends in their standard usage scenarios. Instead, it +requires storing and retrieving thousands of small S3 objects even for very simple operations such +as retrieving a file or listing a directory, incurring a fixed overhead each time. + +This problem is compounded by the design of the IPFS data exchange protocol, +in which nodes may request any data blocks to any other node in the network +in its quest to answer a user's request (like retrieving a file, etc.). +When a node is missing a file or a directory it wants to read, it has to do as many requests to other nodes +as there are IPFS blocks in the object to be read. +On the receiving end, this means that any fully-fledged IPFS node has to answer large numbers +of requests for blocks required by users everywhere on the network, which is what we observed in our experiment above. +We were however surprised to observe that many requests comming from the IPFS network were for blocks +which our node wasn't locally storing a copy of: this means that somewhere in the IPFS protocol, an overly optimistic +assumption is made on where data could be found in the network, and this ends up translating in many requests +between nodes that return negative results. +When IPFS blocks are stored on a local filesystem, answering these requests fast might be possible. +However when using an S3 server as a storage back-end, this becomes prohibitively costly. + +If one wanted to design a distributed storage system for IPFS data blocks, they would probably need to start at a lower level. +Garage itself makes use of a block storage mechanism that allows small-sized blocks to be stored on a cluster and accessed +rapidly by nodes that need to access them. +However passing through the entire abstraction that provides an S3 API is wastefull and redundant, as this API is +designed to provide advanced functionnality such as mutating objects, associating metadata with objects, listing objects, etc. +Plugging the IPFS daemon directly into a lower-level distributed block storage like +Garage's might yield way better results by bypassing all of this complexity. + + +## Conclusion + +Running IPFS over an S3 storage back-end does not quite work out of the box in term of performances. +We have identified that the main problem is linked with the DHT service, +and proposed some improvements (disabling the DHT server, keeping an in-memory index of the blocks, using the S3 back-end only for user data). + +From an IPFS design perspective, it seems however that the numerous small blocks handled by the protocol +do not map trivially to efficient use of the S3 API, and thus could be a limiting factor to any optimization work. + +As part of my testing journey, I also stumbled upon some posts about performance issues on IPFS (eg. [#6283](https://github.com/ipfs/go-ipfs/issues/6283)) +that are not linked with the S3 connector. I might be negatively influenced by my failure to connect IPFS with S3, +but at this point I'm tempted to think that IPFS is intrinsically resource-intensive. + +On our side at Deuxfleurs, we will continue our investigations towards more *minimalistic* software. +This choice makes sense for us as we want to reduce the ecological impact of our services +by deploying less servers, that use less energy, and that are renewed less frequently. + +After discussing with Peergos maintainers, we identified that it is possible to run Peergos without IPFS. +With some optimizations on the block size, we envision great synergies between Garage and Peergos that could lead to +an efficient and lightweight end-to-end encrypted "cloud storage" platform. +*If you happen to be working on this, please inform us!* + + +*We are also aware of the existence of many other software projects for file sharing +such as Nextcloud, Owncloud, Owncloud Infinite Scale, Seafile, Filestash, Pydio, SOLID, Remote Storage, etc. +Many of these could be connected to an S3 back-end such as Garage. +We might even try some of them in future blog posts, so stay tuned!* diff --git a/content/blog/2022-ipfs/peergos.jpg b/content/blog/2022-ipfs/peergos.jpg new file mode 100644 index 0000000..439542b Binary files /dev/null and b/content/blog/2022-ipfs/peergos.jpg differ diff --git a/content/blog/2022-ipfs/upload.png b/content/blog/2022-ipfs/upload.png new file mode 100644 index 0000000..1a9c40b Binary files /dev/null and b/content/blog/2022-ipfs/upload.png differ