Helm installation - Pods don't auto-discover, no CRD #522

Closed
opened 2023-03-08 12:28:57 +00:00 by jonatan · 6 comments
Contributor

Hi all,

I wanted to start using Garage and install it into my Kubernetes cluster.
I used the Helm Chart from the main branch to install it and started with 2 replicas (as for testing I currently only have 2 nodes).
But the 2 pods arent automatically discovering each other, as I thought they should.

From my understanding Garage should deploy a CRD that is then used for discovery.
But I don't see any CRD from garage having been deployed to my cluster, so I think that might be the culprit.

Also it seems like the Helm chart is still deploying the older version 0.7, instead of 0.8, is that intentional?

For reference, I am using k3s and these are my values:

    garage:
      replicationMode: "2"

    deployment:
      replicaCount: 2

    persistence:
      meta:
        storageClass: "local-path"
        size: 100Mi
      data:
        storageClass: "local-path"
        size: 1Gi

    ingress:
      s3:
        api:
          enabled: true
          className: "nginx"
          annotations:
            external-dns.alpha.kubernetes.io/my-target: external
            external-dns.alpha.kubernetes.io/target: ***
            cert-manager.io/cluster-issuer: letsencrypt-prod
          hosts:
            - host: s3-api.***
              paths:
                - path: /
                  pathType: Prefix
          tls:
            - secretName: s3-api-***-tls
              hosts:
                - s3-api.***

Here are some logs from the garage container, if those are of help:

❯ kubectl logs garage-0
Defaulted container "garage" out of: garage, garage-init (init)
 INFO  garage::server > Loading configuration...
 INFO  garage::server > Opening database...
 INFO  garage::server > Initializing background runner...
 INFO  garage::server > Initializing Garage main data store...
 INFO  garage_model::garage > Initialize membership management system...
 INFO  garage_rpc::system   > Generating new node key pair.
 INFO  garage_rpc::system   > Node ID of this node: 0b5ca5ac9a3dbf49
 INFO  garage_rpc::system   > No valid previous cluster layout stored (IO error: No such file or directory (os error 2)), starting fresh.
 WARN  garage_rpc::ring     > Could not build ring: network role assignation data has invalid length
 WARN  garage_rpc::system   > Using autodetected rpc_public_addr: 10.42.0.132:3901. Consider specifying it explicitly in configuration file if possible.
 INFO  garage_model::garage > Initialize block manager...
 INFO  garage_model::garage > Initialize bucket_table...
 INFO  garage_util::background > Worker started: Merkle tree updater for bucket_v2
 INFO  garage_util::background > Worker started: table sync watcher for bucket_v2
 INFO  garage_util::background > Worker started: table syncer for bucket_v2
 INFO  garage_util::background > Worker started: GC loop for bucket_v2
 INFO  garage_model::garage    > Initialize bucket_alias_table...
 INFO  garage_util::background > Worker started: Merkle tree updater for bucket_alias
 INFO  garage_model::garage    > Initialize key_table_table...
 INFO  garage_util::background > Worker started: table syncer for bucket_alias
 INFO  garage_util::background > Worker started: table sync watcher for bucket_alias
 INFO  garage_util::background > Worker started: GC loop for bucket_alias
 INFO  garage_util::background > Worker started: Merkle tree updater for key
 INFO  garage_model::garage    > Initialize block_ref_table...
 INFO  garage_util::background > Worker started: table syncer for key
 INFO  garage_util::background > Worker started: GC loop for key
 INFO  garage_util::background > Worker started: table sync watcher for key
 INFO  garage_util::background > Worker started: Merkle tree updater for block_ref
 INFO  garage_util::background > Worker started: table sync watcher for block_ref
 INFO  garage_model::garage    > Initialize version_table...
 INFO  garage_util::background > Worker started: GC loop for block_ref
 INFO  garage_util::background > Worker started: table syncer for block_ref
 INFO  garage_util::background > Worker started: Merkle tree updater for version
 INFO  garage_model::garage    > Initialize object_table...
 INFO  garage_util::background > Worker started: table sync watcher for version
 INFO  garage_util::background > Worker started: GC loop for version
 INFO  garage_util::background > Worker started: table syncer for version
 INFO  garage_model::garage    > Initialize K2V counter table...
 INFO  garage_util::background > Worker started: table sync watcher for object
 INFO  garage_util::background > Worker started: table syncer for object
 INFO  garage_util::background > Worker started: GC loop for object
 INFO  garage_util::background > Worker started: Merkle tree updater for object
 INFO  garage_util::background > Worker started: Merkle tree updater for k2v_index_counter
 INFO  garage_model::garage    > Initialize K2V subscription manager...
 INFO  garage_model::garage    > Initialize K2V item table...
 INFO  garage_util::background > Worker started: table sync watcher for k2v_index_counter
 INFO  garage_util::background > Worker started: GC loop for k2v_index_counter
 INFO  garage_util::background > Worker started: k2v_index_counter index counter propagator
 INFO  garage_util::background > Worker started: table syncer for k2v_index_counter
 INFO  garage_util::background > Worker started: Merkle tree updater for k2v_item
 INFO  garage_util::background > Worker started: table sync watcher for k2v_item
 INFO  garage_util::background > Worker started: table syncer for k2v_item
 INFO  garage_model::garage    > Initialize Garage...
 INFO  garage_util::background > Worker started: GC loop for k2v_item
 INFO  garage::server          > Initialize tracing...
 INFO  garage::server          > Initialize Admin API server and metrics collector...
 INFO  garage::server          > Create admin RPC handler...
 INFO  garage::server          > Initializing S3 API server...
 INFO  garage::server          > Initializing K2V API server...
 INFO  garage::server          > Initializing web server...
 INFO  garage::server          > Launching Admin API server...
 INFO  garage_rpc::system      > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true)
 INFO  garage_api::generic_server > S3 API server listening on http://[::]:3900
 INFO  garage_api::generic_server > Admin API server listening on http://[::]:3903
 INFO  garage_web::web_server     > Web server listening on http://[::]:3902
 INFO  garage_util::background    > Worker started: block resync worker
 INFO  garage_rpc::system         > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true)
 WARN  garage_rpc::ring           > Could not build ring: network role assignation data has invalid length
 INFO  garage_rpc::system         > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true)
 INFO  garage_rpc::system         > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true)
 INFO  garage_rpc::system         > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true)
Hi all, I wanted to start using Garage and install it into my Kubernetes cluster. I used the Helm Chart from the main branch to install it and started with 2 replicas (as for testing I currently only have 2 nodes). But the 2 pods arent automatically discovering each other, as I thought they should. From my understanding Garage should deploy a CRD that is then used for discovery. But I don't see any CRD from garage having been deployed to my cluster, so I think that might be the culprit. Also it seems like the Helm chart is still deploying the older version 0.7, instead of 0.8, is that intentional? For reference, I am using k3s and these are my values: ```yaml garage: replicationMode: "2" deployment: replicaCount: 2 persistence: meta: storageClass: "local-path" size: 100Mi data: storageClass: "local-path" size: 1Gi ingress: s3: api: enabled: true className: "nginx" annotations: external-dns.alpha.kubernetes.io/my-target: external external-dns.alpha.kubernetes.io/target: *** cert-manager.io/cluster-issuer: letsencrypt-prod hosts: - host: s3-api.*** paths: - path: / pathType: Prefix tls: - secretName: s3-api-***-tls hosts: - s3-api.*** ``` Here are some logs from the garage container, if those are of help: ``` ❯ kubectl logs garage-0 Defaulted container "garage" out of: garage, garage-init (init) INFO garage::server > Loading configuration... INFO garage::server > Opening database... INFO garage::server > Initializing background runner... INFO garage::server > Initializing Garage main data store... INFO garage_model::garage > Initialize membership management system... INFO garage_rpc::system > Generating new node key pair. INFO garage_rpc::system > Node ID of this node: 0b5ca5ac9a3dbf49 INFO garage_rpc::system > No valid previous cluster layout stored (IO error: No such file or directory (os error 2)), starting fresh. WARN garage_rpc::ring > Could not build ring: network role assignation data has invalid length WARN garage_rpc::system > Using autodetected rpc_public_addr: 10.42.0.132:3901. Consider specifying it explicitly in configuration file if possible. INFO garage_model::garage > Initialize block manager... INFO garage_model::garage > Initialize bucket_table... INFO garage_util::background > Worker started: Merkle tree updater for bucket_v2 INFO garage_util::background > Worker started: table sync watcher for bucket_v2 INFO garage_util::background > Worker started: table syncer for bucket_v2 INFO garage_util::background > Worker started: GC loop for bucket_v2 INFO garage_model::garage > Initialize bucket_alias_table... INFO garage_util::background > Worker started: Merkle tree updater for bucket_alias INFO garage_model::garage > Initialize key_table_table... INFO garage_util::background > Worker started: table syncer for bucket_alias INFO garage_util::background > Worker started: table sync watcher for bucket_alias INFO garage_util::background > Worker started: GC loop for bucket_alias INFO garage_util::background > Worker started: Merkle tree updater for key INFO garage_model::garage > Initialize block_ref_table... INFO garage_util::background > Worker started: table syncer for key INFO garage_util::background > Worker started: GC loop for key INFO garage_util::background > Worker started: table sync watcher for key INFO garage_util::background > Worker started: Merkle tree updater for block_ref INFO garage_util::background > Worker started: table sync watcher for block_ref INFO garage_model::garage > Initialize version_table... INFO garage_util::background > Worker started: GC loop for block_ref INFO garage_util::background > Worker started: table syncer for block_ref INFO garage_util::background > Worker started: Merkle tree updater for version INFO garage_model::garage > Initialize object_table... INFO garage_util::background > Worker started: table sync watcher for version INFO garage_util::background > Worker started: GC loop for version INFO garage_util::background > Worker started: table syncer for version INFO garage_model::garage > Initialize K2V counter table... INFO garage_util::background > Worker started: table sync watcher for object INFO garage_util::background > Worker started: table syncer for object INFO garage_util::background > Worker started: GC loop for object INFO garage_util::background > Worker started: Merkle tree updater for object INFO garage_util::background > Worker started: Merkle tree updater for k2v_index_counter INFO garage_model::garage > Initialize K2V subscription manager... INFO garage_model::garage > Initialize K2V item table... INFO garage_util::background > Worker started: table sync watcher for k2v_index_counter INFO garage_util::background > Worker started: GC loop for k2v_index_counter INFO garage_util::background > Worker started: k2v_index_counter index counter propagator INFO garage_util::background > Worker started: table syncer for k2v_index_counter INFO garage_util::background > Worker started: Merkle tree updater for k2v_item INFO garage_util::background > Worker started: table sync watcher for k2v_item INFO garage_util::background > Worker started: table syncer for k2v_item INFO garage_model::garage > Initialize Garage... INFO garage_util::background > Worker started: GC loop for k2v_item INFO garage::server > Initialize tracing... INFO garage::server > Initialize Admin API server and metrics collector... INFO garage::server > Create admin RPC handler... INFO garage::server > Initializing S3 API server... INFO garage::server > Initializing K2V API server... INFO garage::server > Initializing web server... INFO garage::server > Launching Admin API server... INFO garage_rpc::system > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true) INFO garage_api::generic_server > S3 API server listening on http://[::]:3900 INFO garage_api::generic_server > Admin API server listening on http://[::]:3903 INFO garage_web::web_server > Web server listening on http://[::]:3902 INFO garage_util::background > Worker started: block resync worker INFO garage_rpc::system > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true) WARN garage_rpc::ring > Could not build ring: network role assignation data has invalid length INFO garage_rpc::system > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true) INFO garage_rpc::system > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true) INFO garage_rpc::system > Doing a bootstrap/discovery step (not_configured: true, no_peers: true, bad_peers: true) ```
maximilien self-assigned this 2023-03-08 20:11:29 +00:00
Owner

I reproduced the issue and I am looking into it. The reason why the helm chart has the old 0.7.x version is that a migration step is required and the documentation was not quite ready at the time. We're considering raising the version.

I reproduced the issue and I am looking into it. The reason why the helm chart has the old 0.7.x version is that [a migration step is required](https://garagehq.deuxfleurs.fr/documentation/working-documents/migration-08/) and the documentation was not quite ready at the time. We're considering raising the version.
quentin added the
kind
wrong-behavior
label 2023-03-13 14:20:36 +00:00
Contributor

Just to let you (and all people who will encounter this issue) know: bumping version to v0.8.2 (appVersion: "v0.8.2" in Chart.yaml) - node autodiscovery works out of the box.

Manual step to do is creating layout - is there a way to automate it? For example with the same capacity (de facto weight as explained deeply in #357) as those nodes are the same?

Just to let you (and all people who will encounter this issue) know: bumping version to `v0.8.2` (`appVersion: "v0.8.2"` in `Chart.yaml`) - node autodiscovery works out of the box. Manual step to do is creating layout - is there a way to automate it? For example with the same `capacity` (de facto `weight` as explained deeply in #357) as those nodes are the same?
Owner

hi @elwin013, thanks for your workaround. And sorry, we are a bit slow on this issue, as Kubernetes is not the scheduler we use (we chose Nomad some years ago). If someone wants to open a PR to update Garage's Helm chart, it will be welcome.

Currently, we don't provide ways to automatically configure the layout as it handles sensitive data, and based on the use cases we are aware of, it seems better to have a human manually making change to it to prevent data loss.

If you have a specific use case where configuring layout automatically makes sense and is not dangerous, and does not create dangerous situations, you could open an issue so we keep an eye on it. But ofc we can't make any promise wether or no the feature will be implemented.

hi @elwin013, thanks for your workaround. And sorry, we are a bit slow on this issue, as Kubernetes is not the scheduler we use (we chose Nomad some years ago). If someone wants to open a PR to update Garage's Helm chart, it will be welcome. Currently, we don't provide ways to automatically configure the layout as it handles sensitive data, and based on the use cases we are aware of, it seems better to have a human manually making change to it to prevent data loss. If you have a specific use case where configuring layout automatically makes sense and is not dangerous, and does not create dangerous situations, you could open an issue so we keep an eye on it. But ofc we can't make any promise wether or no the feature will be implemented.
Contributor

Hi @quentin, thank you for quick reply!

I fully understand that applying automatic layout could be dangerous and it is better to make this manually. And as long as automatic discovery of nodes works (with v0.8.2) - I can live with that (which means make it manually or write some not-so-fancy script that do this). :-)

I've created small PR with updating Helm chart versions (linked above).

Hi @quentin, thank you for quick reply! I fully understand that applying automatic layout could be dangerous and it is better to make this manually. And as long as automatic discovery of nodes works (with `v0.8.2`) - I can live with that (which means make it manually or write some not-so-fancy script that do this). :-) I've created small PR with updating Helm chart versions (linked above).
Author
Contributor

Thanks @elwin013 for info that changing the version works.

Regarding the migration step:
I understand the concern and normally I would say that this should probably be covered by incrementing the major version of the Chart to indicate the breaking nature of an upgrade without intervention.
But as the Chart is only hosted on Git, that increment would probably be unseen by most users and also probably not picked up by tools like Flux or ArgoCD.

Thanks @elwin013 for info that changing the version works. Regarding the migration step: I understand the concern and normally I would say that this should probably be covered by incrementing the major version of the Chart to indicate the breaking nature of an upgrade without intervention. But as the Chart is only hosted on Git, that increment would probably be unseen by most users and also probably not picked up by tools like Flux or ArgoCD.
Owner

This issue has not seen activity during 1 year and I have the feeling that several questions were mixed up in one thread, so I will close this one for now. Feel free to create new issues if there are still specific things to discuss.

This issue has not seen activity during 1 year and I have the feeling that several questions were mixed up in one thread, so I will close this one for now. Feel free to create new issues if there are still specific things to discuss.
lx closed this issue 2024-03-12 10:37:28 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#522
No description provided.