Support multiple hard drive per server #218
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No project
No assignees
6 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#218
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We have a very specific use case, with only one hard drive per server.
But people often have numerous HDD, up to ~64, and they do not know how to deploy Garage with such setup.
That's the question we had in one of our past exchange.
We could ask them to just create a JBOD (if one disk is offline, we must rebuild the whole server) or a RAID device (we add another level of redundancy).
But I think it would be both simpler, easy to recover and more performant to allow people setting multiple paths, thus using multiple disks in Garage's configuration.
Example of a new configuration file:
It raises some questions:
Another solution could also be to run one garage instace per disk, which would make it harder to deploy and maintain however.
I'd strongly recommend allowing the OS to handle this, either via block-level (MD, LVM, DRBD, iSCSI, etc.) or file-system level (ZFS, BTRFS, etc.), since those tools have spent years perfecting redundancy and recovery.
That way you simply point Garage at any (block device) mount-point.
I had the same question and did not find the answer in the docs.
The thing is, it's wasteful to add redundancy. If you distribute data below garage without redundancy (with LVM, RAID-0 mdadm, ZFS vdev striping, or even hardware RAID-0), as soon as you lose a single disk, you lose everything. This is not good, you would have to rebalance/rebuild your whole dataset. Also, performance-wise, Garage would have no control over the placement of data, so it will likely under-utilize the hardware.
Running one garage instance per disk looks like a good idea: it will clearly separate failure domains, and will provide good parallelism. However it might challenge the P2P part of garage if too many instances are running?
It might be useful to look at Ceph: if I'm not mistaken, Ceph runs one OSD daemon for each physical disk, so you would have as many OSD daemons running in parallel as your number of disks.
For now, I think we can recommend running one garage instance per disk. This is not perfect, as some metadata will be duplicated (but not all) between each instance. On the network part, it should not be blocking but it will involve a small overhead too.
Another hacky solution that I may test soon is by manually mounting data directory.
In this directory, Garage stores chunks according to their hash, in a 2 level tree based on the first and second byte of the hash. So the hash
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
will be stored in the folder./e3/b0/
. It means that on the root folder, you will have 256 folders from00
toff
. If you have 2 disks of the same size, you can mount00
to7f
on disk1 and80
toff
on disk 2.In the future, I would like to directly integrate this solution in Garage, with the top configuration I presented.
Other option: make an XFS partition on each drive, and put them together with
mergerfs
. This looks like it would be a near-optimal setup. TODO: write an example mergerfs mount command with the correct parameters.Hi,
When i seach distribute storage for my home server, i search see minio work with 1Gbps network and found link about garage.
I read blog article, and read listen to learn more.
Garage match perfectly my usage, storage for nextcloud, goal to offer own cloud services for my family (nextcloud, jitsi and for me own services for my /e/os phone).
In doc i found example with one path, in my case i have 6 nodes with each 4 disks of 1To.
Search in issues if similar case reported and find this :)
@lx can you please me explain usage for mergefs / xfs setup ?
Great work guys. Big thanks
I bookmark deuxfleurs website, and follow.
Basically, format each of your drives as XFS and mount them for instance at
/mnt/hdd1
,/mnt/hdd2
, ...,/mnt/hddN
.Then mount everything together at
/mnt/garage-data
, for instance with the following line in/etc/fstab
:Then, point your Garage data directory to
/mnt/garage-data
.I have never tested this setup, it would be interesting to compare this with the setup with one daemon per HDD. In all cases, you can start by formating all your HDDs using XFS. I think one-daemon-per-HDD is probably more performant, but harder to manage and also harder to scale as Garage shouldn't be used to run more than about 100 daemons without special tweaking.
Thanks for answer.
One daemon per HDD, signify that can't manage all disks in one volume ?
In my case i would like "merge" 24 disks to have datapool with replication 2, average ~10To usuable if i follow ceph calc.
Need prepare stuff to test setup,, today i have issue with blk_update_request I/O error, after smartctl, plug/unplug cable on motherboard node, same for disk on front, error disappears :)
For that there is only one choice, run one garage daemon per disk.
I would strongly advise this for now, or running one garage instance per disk.
The failure of a non-redundant block device can cause a lot of strange behaviors in the kernel, and it is very likely that you'll have to restart the kernel to recover the application.
What would happen for example if one of the disk switch to read-only? This is something that can happen with some SSDs when they reach their endurance limit. That would mean that a random subset of the garage node blocks would fail DELETE and WRITE requests...
@lx,
Do you have example for run one garage daemon per disk, please.
I see garage.toml indicate rpc_public_addr, in case of multiple daemons, need define another ports ?
Thanks
Yes that's pretty much it, allocate one port per daemon for the RPC port (
rpc_bind_addr
andrpc_public_port
). If you are interested in monitoring, you should also allocate one port for the admin API on each node (api_bind_addr
in the[admin_api]
section). For the S3 API, you can remove the[s3_api]
section on most nodes and use only one of them to access it (or use a separate gateway node).