Check data_dir valid on startup #601
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#601
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Garage should check that the data_dir provided is valid on start-up, somewhere around:
For my deployment, these are on a separate volume (which I hadn't remounted due to maintenance), however Garage started up just fine without it.
We already have this line that creates the data directory if it doesn't exist (we have the same for metadata). This is in contradiction with the behaviour you are proposing ("check that data_dir and metadata_dir exist and are directories, and fail to start otherwise"). I don't care for either behavior (actually, I think I prefer your suggestion) but I'd like to have people's opinion on this matter before changing anything.
Yes, I thought that it'd be better to check for a random block that the metadata directory thinks should exist on the node.
I can see use cases for both modes:
The 'create if not initialized' for case when you want to garage cluster for testing. I do that occasionally when testing deployment of services that use S3 like Grafana Mimir and want empty cluster as fast and easy as possible.
The 'fail if not initialized' for production, to get fast feedback on mis-configuration. I would not want start node, have it re-create and start syncing just to fill root partition. Because proper disk was not mounted.
Regarding the check itself - I don't think checking just for dir presence is enough. Especially when metadata and data are on different drive (like split between ssd/hdd), the dirs may be present but not actually mounted. We could start even in strict mode and fill-up the root partition.
Actually checking content of the metadata/data directory would be preferable. Either content we know should be there or having custom file signalizing that the content has been initialized. The second variant might be useful for other purposes too - like storing information about which garage version has created the content and with which options.
I was able to workaround this issue by adding this to my garage unit's systemd config:
...probably still useful to have a check done internally.