WIP: Add a roadmap draft #7

Draft
quentin wants to merge 1 commit from roadmap into master

View file

@ -0,0 +1,62 @@
+++
title="Our roadmap towards v1.0"
date=2022-04-10
+++
*a*
<!-- more -->
---
While we hope that Garage in its current state inspired you, we also understand that you may be curious about what will come next!
As per our road towards v1.0, we identified the remaining work and categorized it under one of these 3 domains: *Feature completeness*, *Understandability and manageability*, and *Correctness*.
## Feature completeness
Most importantly, we still need to fix some corner cases on advanced S3 endpoints (eg. [#263](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues), [#248](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/248), [#204](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/204)). Based on community feedbacks, we might also consider implementing additional endpoints (eg. [#166](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/166)) or quotas (eg. [#71](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/71)) but we can't make any promise (sorry!).
Review

#263 (anonymous access) -> this shouldn't be the first thing we evoke, for me it's low priority because we have the web endpoint that mostly serves the same purpose, however if there is a clear vision of what we want to achieve with this which currently can't be done, I'm interested

#248 (fix uploadpartcopy) -> we already have a patch for this, maybe not necessary to have it in the roadmap?

#166 (versionning) -> it looks to me that there is quite some demand for this, so it would be nice to do it, however it's a lot of work and we need to plan some time dedicated to this issue if we want to have it. Moreover it would imply a radical overhaul of the internal data structures of Garage.

#204 (correct multipart uploads) -> we will need to have this at some point; however it also requires changing the data model in a non-trivial way. If we decide to spend time on #166 we can do both at the same time, which would be simpler.

#71 (multi-tenancy) -> as Quentin said in the comments, we can start by a simpler accounting scheme already just to know how much storage is used by each bucket, which can be oportunistically done when we develop the index-counting mechanism for K2V. I don't really know the priority level for this but I'd guess quite low, I haven't heared of anybody who needs this now.

`#263` (anonymous access) -> this shouldn't be the first thing we evoke, for me it's low priority because we have the web endpoint that mostly serves the same purpose, however if there is a clear vision of what we want to achieve with this which currently can't be done, I'm interested `#248` (fix uploadpartcopy) -> we already have a patch for this, maybe not necessary to have it in the roadmap? `#166` (versionning) -> it looks to me that there is quite some demand for this, so it would be nice to do it, however it's a lot of work and we need to plan some time dedicated to this issue if we want to have it. Moreover it would imply a radical overhaul of the internal data structures of Garage. `#204` (correct multipart uploads) -> we will need to have this at some point; however it also requires changing the data model in a non-trivial way. If we decide to spend time on `#166` we can do both at the same time, which would be simpler. `#71` (multi-tenancy) -> as Quentin said in the comments, we can start by a simpler accounting scheme already just to know how much storage is used by each bucket, which can be oportunistically done when we develop the index-counting mechanism for K2V. I don't really know the priority level for this but I'd guess quite low, I haven't heared of anybody who needs this now.
We also made a series of observation (the S3 API is not adapted to handle multiple small objects, many software require a database, we already have an internal database for metadata) which makes us believe that Garage could leverage its internal database system to provide a simple key-value store. We have already written [an API draft](https://p.adnab.me/code/#/2/code/view/eUNPbfoUrMbCY+CoMXaqed4jmWlmvWALHNDcfuM-O5o/embed/present/) for an API we named K2V. In the following months, we will then try to implement it and merge it if it makes sense. Spiritually, we would like it to be close to the original [Amazon Dynamo paper](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf), or if you prefer approximative comparisons, K2V could be to Cassandra what sqlite is to PostgreSQL.
Review

for the API draft please use the following URL: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md

Also I don't really like the comparison with sqlite, it feels wrong to me. I'd rather say we are to Cassandra what LMDB (or BerkeleyDB) is to Sqlite, but I guess nobody knows what LMDB or BerkeleyDB is.

for the API draft please use the following URL: <https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/k2v/doc/drafts/k2v-spec.md> Also I don't really like the comparison with sqlite, it feels wrong to me. I'd rather say we are to Cassandra what LMDB (or BerkeleyDB) is to Sqlite, but I guess nobody knows what LMDB or BerkeleyDB is.
If you worry about feature bloat, be ensured that we do not plan to extend Garage beyond these points!
Review

This looks a bit strange to me, how can we know in advance when we will stop or not? We don't really know who will use Garage in the future and what needs they will have.

This looks a bit strange to me, how can we know in advance when we will stop or not? We don't really know who will use Garage in the future and what needs they will have.
## Understandability and manageability
Based on community feedback (eg. [#228](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/228), [#221](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/221), [#217](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/217)), we have identified some points on which we want to dedicate work: *Consistency Model*, *Administration*, *Storage Model*, and *Ecosystem*.
Garage has currently 2 API: S3 and Admin, with different *Consistency Models*. S3 API's consistency is aligned on [Amazon's new S3 Strong Consistency](https://aws.amazon.com/s3/consistency/) while Admin API's eventual consistency is not yet specified. We want to document first the Admin API's consistency, then we could make S3 consistency explanation more approachable.
The Admin API is also currently only exposed through our custom RPC endpoint. We would like to expose it through a REST endpoint ([#231](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/231)) to ease *Administration*. This REST API would make possible to build a web interface to manage Garage ([#232](https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/232)).
Review

#231 (REST admin endpoints) -> we will need this at Deuxfleurs to interconnect with Guichet so it's relatively high in the priority list. Also it's quite easy work which can probably be done without my intervention.

#232 (admin web UI) -> I don't think we should spend time developping a web UI just for Garage "in the general case", what we need for us is something specific that integrates with Guichet and we should focus on that

The following are missing in the list:

#207 monitoring of background tasks -> this would bring great quality-of-life improvements for Garage admins so I'd say it's pretty high in the list

#255 automatically scrub regularly -> for durability we need this; it would probably be better to develop it in conjunction with #207

`#231` (REST admin endpoints) -> we will need this at Deuxfleurs to interconnect with Guichet so it's relatively high in the priority list. Also it's quite easy work which can probably be done without my intervention. `#232` (admin web UI) -> I don't think we should spend time developping a web UI just for Garage "in the general case", what we need for us is something specific that integrates with Guichet and we should focus on that The following are missing in the list: `#207` monitoring of background tasks -> this would bring great quality-of-life improvements for Garage admins so I'd say it's pretty high in the list `#255` automatically scrub regularly -> for durability we need this; it would probably be better to develop it in conjunction with #207
When people discover Garage, questions also arise around its *Storage Model*, eg. what reliability and density shoud they expect depending on their considered configuration. Ideally, we would like to write a calculator similar to [Minio's one](https://min.io/product/erasure-code-calculator) to help Garage operators understand the properties of their cluster.
Finally, Garage exists in an *Ecosystem*, consisting of multiple alternative software (eg. Ceph, Minio, SeaweedFS, IPFS Cluster), with different needs (research, customer services, archiving), made of various hardware (from a Raspberry Pi to a 64+ disk blade server). We want to situate Garage in its ecosystem while identifying, promoting and documenting some "Garage success story".
<!--2) Explaining how Garage can take its place in the existing ecosystem, including among the other distributed storage systems, but also in term of uses cases and deployments (how does it perform at scale, with which hardware, for which application, etc.) 3) Make possible to manage Garage from a REST API, possibly write a web GUI to make administration easier, 4) help people understand the reliability and storage density they will have for a specific Garage deployment, if possible through a simulator, 5) we might consider adding a system of quota to protect a cluster from a misbehaving user. 6) Integration in ecosystems -->
## Correctness
Review

this section could be called "correctness and performance"

this section could be called "correctness and performance"
We know in theory that Garage's design works and horizontally scales.
But we still need to make sure that in practise our implementation is correct, and thus features these defined properties.
[Jepsen](https://jepsen.io/) is a well-known tool to test distributed system properties by simulating some system states and sequencing packets in a specific order.
We would like to learn it and apply it to Garage to better convince ourselves that we
Review

Jepsen -> we don't have time, let's drop it for now

Jepsen -> we don't have time, let's drop it for now
Consider control theory, speak about tranquilizer ([#145](https://git.deuxfleurs.fr/Deuxfleurs/garage/pulls/145)).
Review

Very low priority for me

Very low priority for me
Horizontal scaling made possible no leader also no erasure coding.
Track bottlenecks, instability and overload.
We also plan to deploy Garage on multiple clusters and do a large serie of benchmarks.
Review

It looks here like we are entering into the "performance" domain. If so, we probably want to evoke a rework of the RPC stack to allow streaming RPCs (maybe based on QUIC but maybe not) which would allow to reduce the TTFB and general latency in many requests while also enabling us to increase the block size, bringing further improvements in performance. To me this is probably the highest priority item in the "performance" milestone.

It looks here like we are entering into the "performance" domain. If so, we probably want to evoke a rework of the RPC stack to allow streaming RPCs (maybe based on QUIC but maybe not) which would allow to reduce the TTFB and general latency in many requests while also enabling us to increase the block size, bringing further improvements in performance. To me this is probably the highest priority item in the "performance" milestone.
<!-- non goals: erasure coding, byzantine tolerance ; do we want quota? -->
## Conclusion
Please note that this roadmap is purely indicative, we are not committing to deliver these features.
Contributions are however welcome, so get in touch (on our Matrix channel) if you want to implement one of them.
We also don't know when v1.0 will be released, except "when it will be ready", but we would be happy if it could be by the end of 2022.
See you soon for the next release!