Proposal: decouple config structs used in garage and garage_model crates #699

Open
opened 2024-01-30 17:56:02 +00:00 by zdenek.crha · 7 comments
Contributor

The current configuration handling in garage is difficult due to two reasons:

  • data types are spread through multiple crates - util, db, config
  • data types are used in multiple contexts - parsing, storage, ...

The proposal is to split and re-arrange config data types into two sets:

First, a garage binary configuration types in garage crate. Used
exclusively for parsing, validating, merging config from ENV, CLI, TOML.
Minimal to no dependency on internal crates like garage_db.

Second, a garage server struct types in garage_model crate. Used
exclusively for passing and storing config in Garage struct. No parsing/serde logic,
but allow re-use of types from lower-level, internal crates.

The config data flow would start with parsing binary config types, then
fallibly converting them garage server config and passing it to Garage
struct.

Advantages

  • single responsibility for each data type set
  • easier to test data-types in isolation
  • ability to move some logic from loose functions to types
  • ability to support multiple TOML file versions (if needed)
  • have to conversion logic (see note)

Disadvantages

  • have to maintain two sets of data types
  • have to maintain conversion logic (see note)

NOTE: The need for conversion logic is toss-up. On one side, it is a code that
has to be maintained. On other, it provides bridge between data type sets,
allowing us to change one side but not the other.

Implementation

The split implementation can be fairly minimal:

  • keep the Config struct structure same, just duplicate it into two crates.
  • garage::config crate would implement serde logic
  • garage::config crate would implement conversion logic
  • garage_model::config would be plain storage without any logic

Outcomes

Implementing the proposal as described in previous section

  • will solve (partialy) the #292
  • may make fixing of #695 easier

It will allow us number of follow-up clean-ups and refactorings:

  • remove fields related to config loading from garage_model::config variant
  • data abstractions for things that are stored as basic data types now
  • turning some loose functions/code into struct methods
  • ...

We will be able to do those without making the code hard to read by mixing
parsing/handling logic in one data type.

The current configuration handling in `garage` is difficult due to two reasons: - data types are spread through multiple crates - `util`, `db`, `config` - data types are used in multiple contexts - parsing, storage, ... The proposal is to split and re-arrange config data types into two sets: First, a `garage` binary configuration types in `garage` crate. Used exclusively for parsing, validating, merging config from ENV, CLI, TOML. Minimal to no dependency on internal crates like `garage_db`. Second, a `garage` server `struct` types in `garage_model` crate. Used exclusively for passing and storing config in `Garage` struct. No parsing/serde logic, but allow re-use of types from lower-level, internal crates. The config data flow would start with parsing binary config types, then *fallibly* converting them `garage` server config and passing it to `Garage` struct. ### Advantages - single responsibility for each data type set - easier to test data-types in isolation - ability to move some logic from loose functions to types - ability to support multiple TOML file versions (if needed) - have to conversion logic (see note) ### Disadvantages - have to maintain two sets of data types - have to maintain conversion logic (see note) NOTE: The need for conversion logic is toss-up. On one side, it is a code that has to be maintained. On other, it provides bridge between data type sets, allowing us to change one side but not the other. Implementation -------------------------------------------------------------------------------- The split implementation can be fairly minimal: - keep the `Config` `struct` structure same, just duplicate it into two crates. - `garage::config` crate would implement `serde` logic - `garage::config` crate would implement conversion logic - `garage_model::config` would be plain storage without any logic Outcomes -------------------------------------------------------------------------------- Implementing the proposal as described in previous section - will solve (partialy) the #292 - may make fixing of #695 easier It will allow us number of follow-up clean-ups and refactorings: - remove fields related to config loading from `garage_model::config` variant - data abstractions for things that are stored as basic data types now - turning some loose functions/code into `struct` methods - ... We will be able to do those without making the code hard to read by mixing parsing/handling logic in one data type.
Author
Contributor

@lx To make it explicit, I'm willing to work on the proposal implementation and possibly some follow-up refactorings too.

@lx To make it explicit, I'm willing to work on the proposal implementation and possibly some follow-up refactorings too.
Owner

Hello @zdenek.crha , thank you very much for the proposal. I like your proposal, and I think we can even make it a bit more clean by not just duplicating everything in the garage_model crate, but by splitting the different config sections into the relevant crates:

  • The main part of the config (rpc secret, replication mode, ...) could be mapped to a Config struct in the garage_model crate
  • The config sections for the s3 api and admin api could be mapped to two structs in the garage_api crate
  • The config section for the web endpoint could be mapped to a struct in the garage_web crate
  • The top-level config struct, currently in util, could be moved to the garage crate

Anyway, it would be great to have your work on this, so feel free to get started. If you'd like i can give feedback on your code as you go.

Hello @zdenek.crha , thank you very much for the proposal. I like your proposal, and I think we can even make it a bit more clean by not just duplicating everything in the garage_model crate, but by splitting the different config sections into the relevant crates: - The main part of the config (rpc secret, replication mode, ...) could be mapped to a Config struct in the garage_model crate - The config sections for the s3 api and admin api could be mapped to two structs in the garage_api crate - The config section for the web endpoint could be mapped to a struct in the garage_web crate - The top-level config struct, currently in util, could be moved to the garage crate Anyway, it would be great to have your work on this, so feel free to get started. If you'd like i can give feedback on your code as you go.
Author
Contributor

@lx I agree we can go further with the split. The duplication I suggested was just to get the split as fast as possible. I actually have pretty long list of possible changes that could be done afterwards :-)

I will use your suggestions for splitting the config to relevant classes and have a go at it.

If we need to split those re-used structs from garage_web, garage_api to support relaxed config parsing in garage and strict modeling in garage_model, we can do it later.

@lx I agree we can go further with the split. The duplication I suggested was just to get the split as fast as possible. I actually have pretty long list of possible changes that could be done afterwards :-) I will use your suggestions for splitting the config to relevant classes and have a go at it. If we need to split those re-used structs from `garage_web`, `garage_api` to support relaxed config parsing in `garage` and strict modeling in `garage_model`, we can do it later.
Author
Contributor

I have started on the config changes and run into issue I have not expected. The garage_rpc crate relies on garage_util::config::Config struct and the moves and splits would result in dependency cycles between various internal crates.

The best approach I came could come with is to refactor garage_rpc to not use Config struct:

It is used only in System::new() method, where pieces of config are extracted, transformed and used as part of System. The Config is not held, it is used just for passing config in. At the same time, there are many pieces from Config used, it makes no sense to pass them in as separate arguments.

Idea I play with is to turn System::new() method into builder struct, so that we could configure all related and unrelated parts piece-by-piece. I'll iterate on it and see if I can make it work or not.

I have started on the config changes and run into issue I have not expected. The `garage_rpc` crate relies on `garage_util::config::Config` struct and the moves and splits would result in dependency cycles between various internal crates. The best approach I came could come with is to refactor `garage_rpc` to *not* use `Config` struct: It is used only in `System::new()` method, where pieces of config are extracted, transformed and used as part of `System`. The `Config` is *not* held, it is used just for passing config in. At the same time, there are many pieces from `Config` used, it makes no sense to pass them in as separate arguments. Idea I play with is to turn `System::new()` method into builder struct, so that we could configure all related and unrelated parts piece-by-piece. I'll iterate on it and see if I can make it work or not.
Owner

I'm not a big fan myself of the builder pattern because to my eyes it looks like mostly boilerplate. Is there not a way where we could regroup the exact subset of configuration parameters used by System::new in a specific struct and just pass that as an argument? If not, yes please go ahead and do what you think is good.

I'm not a big fan myself of the builder pattern because to my eyes it looks like mostly boilerplate. Is there not a way where we could regroup the exact subset of configuration parameters used by `System::new` in a specific struct and just pass that as an argument? If not, yes please go ahead and do what you think is good.
Author
Contributor

Is there not a way where we could regroup the exact subset of configuration parameters used by System::new in a specific struct

@lx I have not considered that approach. Will check and use it if possible.

> Is there not a way where we could regroup the exact subset of configuration parameters used by System::new in a specific struct @lx I have not considered that approach. Will check and use it if possible.
Author
Contributor

@lx Just a quick update - I have not had much energy to spend on garage in last few month, but I'm still interested in working on this.

@lx Just a quick update - I have not had much energy to spend on garage in last few month, but I'm still interested in working on this.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#699
No description provided.