0.8.1: Heavy data directory reading after upgrade #470

Closed
opened 2023-01-09 17:46:25 +00:00 by jpds · 2 comments
Contributor

I upgraded my cluster from 0.7.2 to 0.8.1, and followed all the steps at: https://garagehq.deuxfleurs.fr/documentation/working-documents/migration-08/

There's currently zero S3/web activity on the cluster nor anything in the queues, but I'm observating a lot of disk reading:

$ zpool iostat 1
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      5      0  2.33M      0
zroot       7.46G   225G      0      5      0   447K
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      6      0  3.08M      0
zroot       7.46G   225G      0     70      0   575K
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T     13      0  2.94M      0
zroot       7.46G   225G      0      5      0   469K
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T     13      0  4.59M      0
zroot       7.46G   225G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      5      0  3.39M      0
zroot       7.46G   225G      0      5      0   384K
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      6      0  5.01M      0
zroot       7.46G   225G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      6      0  5.39M      0
zroot       7.46G   225G      0     78      0  1.02M
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      7      0  4.99M      0
zroot       7.46G   225G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      6      0  5.66M      0
zroot       7.46G   225G      0      5      0   456K
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T     13      0  6.97M      0
zroot       7.46G   225G      0      0      0  3.98K
----------  -----  -----  -----  -----  -----  -----
garage       132G   4.4T      8      0  3.19M      0
zroot       7.46G   225G      0      4      0   382K
----------  -----  -----  -----  -----  -----  -----

If I probe DTrace for things being opened, it seems garage is crawling its own data directory:

$ dtrace -n 'syscall::open*:entry { printf("%s %s", execname, copyinstr(arg0)); }'
  2  82631                       open:entry garage /srv/garage/20/c5
  2  82631                       open:entry garage /srv/garage/20/d2
  3  82631                       open:entry garage /srv/garage/20/ad
  2  82631                       open:entry garage /srv/garage/20/fc
  3  82631                       open:entry garage /srv/garage/20/d8
  3  82631                       open:entry garage /srv/garage/20/7a
...  
  2  82631                       open:entry garage /srv/garage/7f/8c
  0  82631                       open:entry garage /srv/garage/7f/46
  0  82631                       open:entry garage /srv/garage/7f/31
  2  82631                       open:entry garage /srv/garage/7f/b3
  1  82631                       open:entry garage /srv/garage/7f/e4
  3  82631                       open:entry garage /srv/garage/7f/e9
...
  0  82631                       open:entry garage /srv/garage/0a/9d
  0  82631                       open:entry garage /srv/garage/0a/51
  1  82631                       open:entry garage /srv/garage/0a/26
  1  82631                       open:entry garage /srv/garage/0a/ae
  1  82631                       open:entry garage /srv/garage/0a/fb
  0  82631                       open:entry garage /srv/garage/0a/d9

Is there something else I can try to debug where this is coming from?

I upgraded my cluster from `0.7.2` to `0.8.1`, and followed all the steps at: https://garagehq.deuxfleurs.fr/documentation/working-documents/migration-08/ There's currently zero S3/web activity on the cluster nor anything in the queues, but I'm observating a lot of disk reading: ``` $ zpool iostat 1 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 5 0 2.33M 0 zroot 7.46G 225G 0 5 0 447K ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 6 0 3.08M 0 zroot 7.46G 225G 0 70 0 575K ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 13 0 2.94M 0 zroot 7.46G 225G 0 5 0 469K ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 13 0 4.59M 0 zroot 7.46G 225G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 5 0 3.39M 0 zroot 7.46G 225G 0 5 0 384K ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 6 0 5.01M 0 zroot 7.46G 225G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 6 0 5.39M 0 zroot 7.46G 225G 0 78 0 1.02M ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 7 0 4.99M 0 zroot 7.46G 225G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 6 0 5.66M 0 zroot 7.46G 225G 0 5 0 456K ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 13 0 6.97M 0 zroot 7.46G 225G 0 0 0 3.98K ---------- ----- ----- ----- ----- ----- ----- garage 132G 4.4T 8 0 3.19M 0 zroot 7.46G 225G 0 4 0 382K ---------- ----- ----- ----- ----- ----- ----- ``` If I probe DTrace for things being opened, it seems `garage` is crawling its own data directory: ``` $ dtrace -n 'syscall::open*:entry { printf("%s %s", execname, copyinstr(arg0)); }' 2 82631 open:entry garage /srv/garage/20/c5 2 82631 open:entry garage /srv/garage/20/d2 3 82631 open:entry garage /srv/garage/20/ad 2 82631 open:entry garage /srv/garage/20/fc 3 82631 open:entry garage /srv/garage/20/d8 3 82631 open:entry garage /srv/garage/20/7a ... 2 82631 open:entry garage /srv/garage/7f/8c 0 82631 open:entry garage /srv/garage/7f/46 0 82631 open:entry garage /srv/garage/7f/31 2 82631 open:entry garage /srv/garage/7f/b3 1 82631 open:entry garage /srv/garage/7f/e4 3 82631 open:entry garage /srv/garage/7f/e9 ... 0 82631 open:entry garage /srv/garage/0a/9d 0 82631 open:entry garage /srv/garage/0a/51 1 82631 open:entry garage /srv/garage/0a/26 1 82631 open:entry garage /srv/garage/0a/ae 1 82631 open:entry garage /srv/garage/0a/fb 0 82631 open:entry garage /srv/garage/0a/d9 ``` Is there something else I can try to debug where this is coming from?
Owner

Probably your node is doing a scrub of the stored data to check for corruptions. It does it once every month. It's meant to be a background process that limits itself in terms of I/O in order to leave space for interactive requests to be served first. You can check the progress of the scrub using garage worker list and garage worker info. You can change the speed of the scrub using garage worker set scrub-tranquility (zero is the fastest possible, larger values mean more interval between iterations and therefore a smaller proportion of I/O time used by the scrub).

Probably your node is doing a scrub of the stored data to check for corruptions. It does it once every month. It's meant to be a background process that limits itself in terms of I/O in order to leave space for interactive requests to be served first. You can check the progress of the scrub using `garage worker list` and `garage worker info`. You can change the speed of the scrub using `garage worker set scrub-tranquility` (zero is the fastest possible, larger values mean more interval between iterations and therefore a smaller proportion of I/O time used by the scrub).
Author
Contributor

Very interesting - that was indeed it.

Very interesting - that was indeed it.
jpds closed this issue 2023-01-09 18:19:05 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#470
No description provided.