Metadata written infinitely #857
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#857
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
With the following setup under podman, garage is writing to the metadata infinitely.
When garage is idle it starts writing to the metadata with avg 400MB/s and never stops.
Medatada for both instances is 56G (it should not write forever)
Logs are filled with when the writings starts:
Could you provide a script /
Dockerfile
that lead to this behavior?Is it enough to start garage with sqlite to reproduce your issue?
Service is run under podman quadlet.
Systemd will generate a service file based on garage.kube:
This is the actual pod generation file garage.yaml:
And the garage.toml:
@Jad what is your underlying storage? E.g. is it a physical disk, network mount, what filesystem, etc?
We're running garage with sqlite and have not observed this behavior, but we have seen some issues with sqlite in general depending on what the underlying storage is. E.g. if it is distributed network storage vs. local disk. If it is any kind of networked storage, it needs to behave within the expectations of sqlite, e.g. POSIX compliant, latency not too high, etc.
2 HDD (data) that are passthrough from Hyper-V
1 virtual disk (meta) stored on SSD(in RAID I)
ext4 filesystem
df -Th
lsblk
fio --name=test --filename=/mnt/meta/test --rw=randrw --size=256M --direct=1 --runtime=15 --sync=1
@Jad I'm not sure if this is your issue, but your benchmarks are at least an order of magnitude less performant than native access to a directly-attached nvme, or even spinning disk.
I'd suggest trying to reproduce the issue on directly-attached storage. If you cannot, that will tell some kind of story about what the actual problem is.
Also, it might be worth running a POSIX compatibility test on your filesystem. Tools like
pjdfstest
can verify if your filesystem is fully POSIX compliant. This could help identify any underlying compatibility issues that might be affecting performance.In theory, robust software like sqlite should run well on any disk, but we've observed it to break down when the underlying storage goes outside normal bounds of latency and bandwidth. POSIX compliance issues could also potentially contribute to such abnormal behavior.