High memory usage #681
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#681
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Hello
I deployed garage in K8s cluster, and for a reason I dont know he consumes relative high memory:
The disk usage currently is consider low, around 8GB~ used for data volumes, and 800MB~ of meta volumes
Im using db engine
lmdb
I didnt configured the
lmdb_map_size
parameter tho (kept it as 1TB by default)I dont have many files ,only few hundreds
I guess that it is lmdb, if does, how can I keep it smaller ? why does he takes 800mb in the first place ?
worth noting i didnt changed any configurations about it, kept everything as default
The garage cluster using for Thanos & Quickwit
and we have many write & read operations on those buckets
but still the dataset is relative small
When I restarting the garage instances, the memory usage returns to normal:
$ kubectl top po
65mb+- are the istio sidecars
so garage itself consumes now 5-10mb
i wonder why he stuck on 1gb
the lmdb db size drops little bit but still around 500-800mb (dont really care much about the size, just for the info)
That is not memory used by the garage process itself, it is memory used by the kernel to cache pages of the LMDB data file. It's normal and expected behavior, and the reason why LMDB is so fast. These memory page are only cache which is managed by the kernel, and the kernel can easily free these pages when it needs memory for other processes. In your case the kernel decided to keep those in RAM, probably because it had nothing better to do with it.
Hi @lx
I've noticed a unique issue in our Kubernetes (K8s) clusters that doesn't occur with other applications we use.
Typically, in K8s, when applications in containers no longer require memory and release it, this is reflected in the pods, showing a return to lower memory usage requests. However, this isn't happening with certain applications, for e.g. garage & lmdb, and a few others where memory leaks have been detected through profiling.
To provide some context, we're managing hundreds of applications across approximately 20 K8s clusters. The issue isn't universal, as many applications do successfully reclaim memory. Here's a brief overview of some applications in our clusters:
This issue has also been observed in bare metal K8s clusters running on Ubuntu 22.04.3 (V1.26), as well as on GKE 1.27 and EKS versions 1.27 and 1.28, among others
I believe that the heap size of the LMDB may not be designed to reclaim and free memory once it's no longer needed, or mabye it is garage, I dont know which
these are the following output of
ps aux
:these is the output of
cat /proc/1/status
:This is the top pods when I have issued the commands above on pod
garage-0
:Hello again,
There is nothing I can do because this is normal and expected behavior of the LMDB storage engine. I can assure you that this is not an issue, because even if the memory is not reclaimed immediately, it will be reclaimed as soon as it is needed for something else.
If you don't like this behavior, please consider configuring garage with another metadata storage engine. Sqlite ahould work quite well.
These line indicate that most of the resident set size, I.e. of the memory currently mapped in garages virtual address space, is file-backed memory. This memory can be freed by the kernel at any time, it is not garage's job to free it explicitly. The anonymous set size, which corresponds to the heap, is only 10mb which is normal, so garage is not leaking memory.
@lx
Hi,
it seems you're right about the application allowing the kernel to handle its memory usage. However, in the Kubernetes (K8s) ecosystem, if an application does not properly release its memory and inform the underlying kernel, it disrupts Kubernetes' ability to efficiently manage memory across the entire cluster. Consequently, this affects the optimal distribution of containers across the nodes. We've faced challenges with nodes not being scheduled effectively due to the cluster's lack of awareness about the kernel's caching mechanism. This issue is particularly important in orchestration solutions like Kubernetes. I'm considering trying SQLite, although I'm uncertain if it has been as rigorously tested as LMDB.
Is there a workaround you could use in Kubernetes so that it doesn't take into account the quantity reported as "used by Garage", but instead reserves a fixed quantity of memory and makes all scheduling decisions according to that? On the Deuxfleurs infra, we use Nomad as a scheduler for workloads in a cluster and to my knowledge scheduling decisions are made like this, on the basis of fixed memory reservations (the model allows for over-commit, i.e. reserve 500M for a container, and allow it to use up to 1G).
@lx
Nomad and Kubernetes function in a similar way, with both allowing for the setting of resource requests and limits. In Kubernetes, I've configured the request memory at 500MB and set the burstable limit to 2GB. Additionally, I'm utilizing the Vertical Pod Autoscaler (VPA) which dynamically adjusts these requests. It's acceptable for containers to occasionally exceed their reserved memory, but if they persistently occupy more memory without releasing it, the Kubernetes scheduler (kube-scheduler) interprets the nodes as full and stops scheduling new pods on them. To manage this, I've been restarting the so-called "garage pods" every 24 hours to free up memory, enabling the scheduler to place new pods on these nodes. However, I'm looking for a solution to avoid this daily restart process.
@maximilien is our local Kubernetes expert, maybe he has an idea?
AFAIK this is not a garage problem, and neither is a LMDB problem. As @lx said, "memory usage" in linux is significantly more complex than the "this app is using 1GB of memory" falsehood than you're getting out of
kubectl top
. On the topic I highly recommend multiple talks by Chris Down - like Linux memory management at scale or 7 years of cgroup v2: the future of Linux resource control (FOSDEM 2023). Both of them will give you a primer into the root cause of the issue you're having.Now on the more practical side, I would suggest:
The way LMDB operate cache-wise is that it rests on the kernel and other application exercising memory pressure to shrink it's cache. It ensure that it always makes the most of the available memory for performance. If you put the application in an environnement where it is isolated from other workloads and keep giving it more memory as it leverage the one it has available, it will effectively grow indefinitely...