Handle FD starvation correctly

withinboredom commented

2023-07-10 20:53:38 +00:00

Contributor

It appears that if another process starves a node of all file descriptors, garage reports that a block was saved successfully. However, it will be impossible to actually read the block because it was never written.

If Garage fails to open files due to FD starvation, it should crash and fail to start -- perhaps even sending the block currently in memory to another node before crashing.

It appears that if another process starves a node of all file descriptors, garage reports that a block was saved successfully. However, it will be impossible to actually read the block because it was never written. If Garage fails to open files due to FD starvation, it should crash and fail to start -- perhaps even sending the block currently in memory to another node before crashing.

lx commented

2023-07-11 09:30:44 +00:00

Owner

I don't understand how this is possible. If there is no available file descriptor, the open syscall should return an error, and the code always propagates such errors to the caller. Garage should not report that the block was successfully saved in this situation.

What might happen is that the block could not be saved to a single nodes, but it could be saved on all of the other nodes and therefore a quorum was achieved and the write succeeded globally (this is an expected scenario).

Are you sure that this is really what is happening? How do you know? Please provide proof of what you are saying : how did you come to this conclusion, how can I reproduce the issue, etc.

I don't understand how this is possible. If there is no available file descriptor, the `open` syscall should return an error, and the code always propagates such errors to the caller. Garage should not report that the block was successfully saved in this situation. What might happen is that the block could not be saved to a single nodes, but it could be saved on all of the other nodes and therefore a quorum was achieved and the write succeeded globally (this is an expected scenario). Are you sure that this is really what is happening? How do you know? Please provide proof of what you are saying : how did you come to this conclusion, how can I reproduce the issue, etc.

withinboredom commented

2023-07-11 19:15:43 +00:00

Author

Contributor

garage block list-errors
Hash                                                              RC  Errors  Last error      Next try
02d36a316fc4a845b66f446e1913a21e6a6d344785d2e07dd798e63dc0175207  0   1       17 hours ago    in 584941749 years
038143b8ce623612aa3fc2efdc9c22750c540125164a83797572ab28f089a29e  0   1       15 hours ago    in 584941749 years
0475ac1d2a7d2d440ade6c53dcdf3609a4eb45ef797f6d213ffdbc6a6992a6e7  0   1       9 hours ago     in 584941749 years
0477d53d49edad6bd482db7a6c661c4db417480d8a5a8a04f1075057c6410684  0   73      5 hours ago     in 584941749 years
06904ce1fac5aa4e341da9cb2dd88009fd67d2899cf189bdea4902ce2f5b2855  0   1       9 hours ago     in 584941749 years
099e9eaf08870de3e6b1b137849b37eaa833ab3c0e93ee67e17801e7a30c1259  0   1       13 hours ago    in 584941749 years
0cec12550c18aee76b36f4e00ee4554baecde8b5e066f37af999138a4b21d4e0  0   1       1 hour ago      in 584941749 years
0e9b4fe0aa56d6c1d82ca745b809a5b62e82fb57e4666b6cf69f37a450ccfa56  0   72      5 hours ago     in 584941749 years
11ef351ba6b7c4cf650604149971e1455bcb8536eb253615ad5444ecbc5bc38d  0   72      6 hours ago     in 584941749 years
1258a57c8620be76fe7efbaed9fe3ec3033b12f77468d091c2fd0300fd1fa0d2  0   1       15 hours ago    in 584941749 years
1b0aabbcdaedd25aa5779699ee95b86d025c953e587a0397b1128a11e4d37e6b  0   1       5 hours ago     in 584941749 years
1b2105b0851d30233c66fe2d6aec9d6e69c424f530289f11a3a130a297b81762  0   1       4 hours ago     in 584941749 years
1da7b68ce52bcdacd4592ce183823a02582757cacbb57d7af5483e335f60bbea  0   1       4 hours ago     in 584941749 years
1e5d1c6b345122da0faa1a76efc13f0c8aee69fc33ed52f8320e18b8c10100e8  0   2       2 hours ago     in 584941749 years
1e9067252b5653328f37c06552d54b10719ac1a454ab0d3a5cd7ec0fbfce3111  0   2       11 hours ago    in 584941749 years
2b41e14b57d71f505e39f452f06d4a3536b9064757406c017513cd514939a248  0   1       2 hours ago     in 584941749 years
2b4a5d7d35dbce5072287283c9058967ee98ff9770a82a0e10775b14ff2f1591  0   1       2 hours ago     in 584941749 years
2d6b989158790d30606ca711c1bb943ae18c803d5ec7f4a6e09ac2c8b6c232fd  0   98      5 hours ago     in 584941749 years
366876f935fa2911812dc7111a1ff6e354851132119c20dbe6d828dd862e24c5  0   81      3 hours ago     in 584941749 years
39d80e19c7e622e616d6a553d575f289926e24661b3859fff8cb01389ef7022c  0   1       13 hours ago    in 584941749 years
39fc2c24614e0189a3c7b871390083aff4780ae78ad405683487ae4f7910f768  0   1       14 hours ago    in 584941749 years
3be98397878468169b47efe1113b4313a824b6c0630c0ba03d75a11d5fcae034  0   1       1 hour ago      in 584941749 years
3c941f5f1fea8b954d87e2bf3e1320e4f73f41a93812a70f050e2f9433b79bff  0   1       4 hours ago     in 584941749 years
4334e2ce571ac0fd500cc40e5418197460834b2b20f7360d4614086fd1ba2b15  0   1       14 hours ago    in 584941749 years
43a315bedac42cf9ae54e2eef64c1c88e62a0ce36596fa1f55450c7067729db8  0   1       8 hours ago     in 584941749 years
45c2c0e4a37dca72c0ac87b4cce56cc45fcfdd10dbb43b78b02f4ec285f511d2  0   1       10 hours ago    in 584941749 years
478fab9f4a74f19a1566930feebf4a4a1dfcb2481cceeaa12c9fb14ff5d16fa7  0   1       7 hours ago     in 584941749 years
4e14a344304a04b767364dea79281031034a55466361bfd451650d23d812348c  0   71      5 hours ago     in 584941749 years
51b708e1b0fdb8a1b9c5998576e2fe7540f857698dded93d60e98b6ca516e4b4  0   1       15 hours ago    in 584941749 years
5256f3bb779c7088adc625fe5f706980f177781ba2ec3946857c408d9e9dc5bc  0   2       9 hours ago     in 584941749 years
52e2170f9eaeace08d457a722a9624f32192d937a538ffe7b6c6a3ad9ae69242  0   1       4 hours ago     in 584941749 years
54d55b7903d1a7cb1f691d4cc87db168b0420058bf436b26d94b6d97a7847d53  0   1       17 hours ago    in 584941749 years
5d29a7b2238e065d61e24ab893203d67b175178453de300ce703716703305478  0   4012    3 hours ago     in 584941749 years
5eaf16d0303c81ed9240116eb4980879d88a6af82e2279c199a4c1a328fcc060  0   1       2 hours ago     in 584941749 years
5efe409220873b95d4d1d82e698a9a76360a1c54732f0edf4c9e7752f7ac4313  0   1       13 hours ago    in 584941749 years
614be81ffb4bb0b1d7855fbbc525283584f5479aa31a597e91ccdf435e176182  0   1       16 hours ago    in 584941749 years
66a467f26b768361682a0e232dd06af03870d6c2dc152332d8039d33ecffa0ab  0   1       7 hours ago     in 584941749 years
697c886fac0dc36bf7de7e1d582c9c0310c49ed90d6628627fe7824793f37799  0   1       10 hours ago    in 584941749 years
6a73ecd6808c6e944ece2263fc8d673cea43845910f80db8f0b23b8470b701ec  0   1       17 hours ago    in 584941749 years
6a964c0f3d653ecd9eded97c161d583dd42493d16848aa856f9aa09774ca2fe3  0   1       11 hours ago    in 584941749 years
6b514421ce47003add9026df88ad7889f9ea0e6fb939d07b9696ccc6f5410671  0   1       9 hours ago     in 584941749 years
716a12f087cc3bc6828e2f42c774076d7fb8ef67de0465104caa8eeda68eadfc  0   1       11 hours ago    in 584941749 years
78b640220f90fe8834dd6f38c97978f7b39dd5f3df72783d1d8edc7ac40c4702  0   1       1 hour ago      in 584941749 years
7b80f6200acb0815d226bddf45238737a00b63ee97b9663085547a8b7666fa3f  0   1       57 minutes ago  in 584941749 years
7be5458a4ab45bb9bf18dba6d68d57f74b7f71d01d67252ebe4db06abed0e739  0   1       51 minutes ago  in 584941749 years
8e7582c60b92a7ddebe90bb450a8dd4c68b7462bd696e9b31fdbe6ce7e86ded0  0   1       12 hours ago    in 584941749 years
8e9aeaf18a7707cdafa2e59504ba1490f4b614668553059bf25c89b9899282e9  0   1       5 hours ago     in 584941749 years
8fe29f09887dadfef9ed1bc212effa4ef1d199c4e8da51453f92ff23a46f7f87  0   1       4 hours ago     in 584941749 years
906d8da08a1de987a05091c7af4e99fe5171ad76b087f1e653b895d17acb91b1  0   1       4 hours ago     in 584941749 years
98df6eb6286acb6241d7a63322ad144132e97f95d317592b499bdef0ed935519  0   1       8 hours ago     in 584941749 years
9dc2c5e31de825d731b8277ab42130e947c72c76bc681758daadb0b09eb270ce  0   2       9 hours ago     in 584941749 years
a0a2f41e197b0b1466c716a99bd4f45f24c6d78779c75d38e82e0c6cb5e3a318  0   72      6 hours ago     in 584941749 years
a35acb2f1b40f19bde45ade63932346e182cf113bd565b0cf70a60c007843fd5  0   1       36 minutes ago  in 584941749 years
a39a3475bcdc59146fae9bb5deac2dcf2b5ec697df25ad1cb8036a77fe74e089  0   1       13 hours ago    in 584941749 years
a74bb2ec4b231e56ebeb9583cadd3bb80329b069cde9a26b4f61ed06de8c8bcd  0   1       7 hours ago     in 584941749 years
b038632359a3da7b9bbff52627e9f96903dfed132176fe2e1699efb51800e147  0   1       18 hours ago    in 584941749 years
b068f8a4072f418e9fda7336d98f920b692d06fd7af71803f4071599b3e7f7d6  0   1       17 hours ago    in 584941749 years
b550dc7e479a4deb6851b452b74af69838f92dfe36340d5f4f0a5ccb1d922034  0   1       3 hours ago     in 584941749 years
b584a99713b90b94266778d83624c923e9fe185eddcfec3710d988a9d7573f61  0   1       12 hours ago    in 584941749 years
c2488173219a672d9e457be63864dba0289e91f47889f4d290eb00624460d25b  0   1       7 hours ago     in 584941749 years
c36ab02ea946986e4f72c9e779c9bed41d0dded4edf387dad4a00ae908593be6  0   1       14 hours ago    in 584941749 years
c80fae07d2f173f520d1e05e2ccb3d2f84105098f7586a676ffd906dc1017a9a  0   1       10 hours ago    in 584941749 years
c86eb442460b72cae749695c2b7dcdb06c10d1f7348b6266a44d1b28719433be  0   2       6 hours ago     in 584941749 years
c889ab9d9fc3b52dc3dad78f1cd6b7c18110fbf6a9b28105dd3d5dba68384b48  0   2       15 hours ago    in 584941749 years
c8d1f9df1b786af50da682879a9a4773f542ca29a5b24681e3434340dbfb914d  0   1       23 minutes ago  in 584941749 years
d227c101db7cd453eb8af37ca772e05e4fa3dd7226211d66a0592eb0687956ba  0   1       16 hours ago    in 584941749 years
d2c527bff5278da24be0fabe64a721c7d05c5f3100762a362670f7a47db4dfc3  0   1       12 hours ago    in 584941749 years
d86e69065c8dcd5515b1194dc7b7492c37b903c750e350dac129c0004f4a484b  0   1       16 hours ago    in 584941749 years
d93448c1f9da56018347f8d61bcfa6a53ce293ca95b68a11e5f3ddbb773eeeff  0   1       5 hours ago     in 584941749 years
d9f42c495b4b826a8aaa55c590fc6d53048ce2218e53b2d005b26ae7a3931289  0   1       8 hours ago     in 584941749 years
dc208fd3ccb3194d310088d62f4cca24e9ecad71a0b2a8112e9db810a1b3d6e5  0   1       18 hours ago    in 584941749 years
df9c89e927e2b335b1afc8a15940df71d37afb95a54586227a4b00b545be6a3a  0   1       16 hours ago    in 584941749 years
dfc8bc72b816646da70ba6aa0a57e14fef639f0a47c04412b2307d2e03f31b48  0   1       8 hours ago     in 584941749 years
e552edd8d3589bd0c525b63f0af6d33005a9d98f5f6151d3c6117ad68ebffb8b  0   1       3 hours ago     in 584941749 years
f39fe962b7b9bdd141198897b58cdd7768a52b367562420c48228bc156eb991c  0   1       12 hours ago    in 584941749 years
ffcba761a5c034a45783e4de18bf9ccce25d4e5f86dd2f1a1d2559855a7cd0aa  0   1       9 hours ago     in 584941749 years

These blocks are gone completely, as far as I can tell. I had a node that was running netdata (https://github.com/netdata/helmchart/issues/372) which somehow consumed all file descriptors on the node. This is about half of the errors, I've been manually doing garage block purge one-by-one since garage block list errors | awk '{print $1;}' | xargs garage block purge --yes doesn't seem to work.

``` garage block list-errors Hash RC Errors Last error Next try 02d36a316fc4a845b66f446e1913a21e6a6d344785d2e07dd798e63dc0175207 0 1 17 hours ago in 584941749 years 038143b8ce623612aa3fc2efdc9c22750c540125164a83797572ab28f089a29e 0 1 15 hours ago in 584941749 years 0475ac1d2a7d2d440ade6c53dcdf3609a4eb45ef797f6d213ffdbc6a6992a6e7 0 1 9 hours ago in 584941749 years 0477d53d49edad6bd482db7a6c661c4db417480d8a5a8a04f1075057c6410684 0 73 5 hours ago in 584941749 years 06904ce1fac5aa4e341da9cb2dd88009fd67d2899cf189bdea4902ce2f5b2855 0 1 9 hours ago in 584941749 years 099e9eaf08870de3e6b1b137849b37eaa833ab3c0e93ee67e17801e7a30c1259 0 1 13 hours ago in 584941749 years 0cec12550c18aee76b36f4e00ee4554baecde8b5e066f37af999138a4b21d4e0 0 1 1 hour ago in 584941749 years 0e9b4fe0aa56d6c1d82ca745b809a5b62e82fb57e4666b6cf69f37a450ccfa56 0 72 5 hours ago in 584941749 years 11ef351ba6b7c4cf650604149971e1455bcb8536eb253615ad5444ecbc5bc38d 0 72 6 hours ago in 584941749 years 1258a57c8620be76fe7efbaed9fe3ec3033b12f77468d091c2fd0300fd1fa0d2 0 1 15 hours ago in 584941749 years 1b0aabbcdaedd25aa5779699ee95b86d025c953e587a0397b1128a11e4d37e6b 0 1 5 hours ago in 584941749 years 1b2105b0851d30233c66fe2d6aec9d6e69c424f530289f11a3a130a297b81762 0 1 4 hours ago in 584941749 years 1da7b68ce52bcdacd4592ce183823a02582757cacbb57d7af5483e335f60bbea 0 1 4 hours ago in 584941749 years 1e5d1c6b345122da0faa1a76efc13f0c8aee69fc33ed52f8320e18b8c10100e8 0 2 2 hours ago in 584941749 years 1e9067252b5653328f37c06552d54b10719ac1a454ab0d3a5cd7ec0fbfce3111 0 2 11 hours ago in 584941749 years 2b41e14b57d71f505e39f452f06d4a3536b9064757406c017513cd514939a248 0 1 2 hours ago in 584941749 years 2b4a5d7d35dbce5072287283c9058967ee98ff9770a82a0e10775b14ff2f1591 0 1 2 hours ago in 584941749 years 2d6b989158790d30606ca711c1bb943ae18c803d5ec7f4a6e09ac2c8b6c232fd 0 98 5 hours ago in 584941749 years 366876f935fa2911812dc7111a1ff6e354851132119c20dbe6d828dd862e24c5 0 81 3 hours ago in 584941749 years 39d80e19c7e622e616d6a553d575f289926e24661b3859fff8cb01389ef7022c 0 1 13 hours ago in 584941749 years 39fc2c24614e0189a3c7b871390083aff4780ae78ad405683487ae4f7910f768 0 1 14 hours ago in 584941749 years 3be98397878468169b47efe1113b4313a824b6c0630c0ba03d75a11d5fcae034 0 1 1 hour ago in 584941749 years 3c941f5f1fea8b954d87e2bf3e1320e4f73f41a93812a70f050e2f9433b79bff 0 1 4 hours ago in 584941749 years 4334e2ce571ac0fd500cc40e5418197460834b2b20f7360d4614086fd1ba2b15 0 1 14 hours ago in 584941749 years 43a315bedac42cf9ae54e2eef64c1c88e62a0ce36596fa1f55450c7067729db8 0 1 8 hours ago in 584941749 years 45c2c0e4a37dca72c0ac87b4cce56cc45fcfdd10dbb43b78b02f4ec285f511d2 0 1 10 hours ago in 584941749 years 478fab9f4a74f19a1566930feebf4a4a1dfcb2481cceeaa12c9fb14ff5d16fa7 0 1 7 hours ago in 584941749 years 4e14a344304a04b767364dea79281031034a55466361bfd451650d23d812348c 0 71 5 hours ago in 584941749 years 51b708e1b0fdb8a1b9c5998576e2fe7540f857698dded93d60e98b6ca516e4b4 0 1 15 hours ago in 584941749 years 5256f3bb779c7088adc625fe5f706980f177781ba2ec3946857c408d9e9dc5bc 0 2 9 hours ago in 584941749 years 52e2170f9eaeace08d457a722a9624f32192d937a538ffe7b6c6a3ad9ae69242 0 1 4 hours ago in 584941749 years 54d55b7903d1a7cb1f691d4cc87db168b0420058bf436b26d94b6d97a7847d53 0 1 17 hours ago in 584941749 years 5d29a7b2238e065d61e24ab893203d67b175178453de300ce703716703305478 0 4012 3 hours ago in 584941749 years 5eaf16d0303c81ed9240116eb4980879d88a6af82e2279c199a4c1a328fcc060 0 1 2 hours ago in 584941749 years 5efe409220873b95d4d1d82e698a9a76360a1c54732f0edf4c9e7752f7ac4313 0 1 13 hours ago in 584941749 years 614be81ffb4bb0b1d7855fbbc525283584f5479aa31a597e91ccdf435e176182 0 1 16 hours ago in 584941749 years 66a467f26b768361682a0e232dd06af03870d6c2dc152332d8039d33ecffa0ab 0 1 7 hours ago in 584941749 years 697c886fac0dc36bf7de7e1d582c9c0310c49ed90d6628627fe7824793f37799 0 1 10 hours ago in 584941749 years 6a73ecd6808c6e944ece2263fc8d673cea43845910f80db8f0b23b8470b701ec 0 1 17 hours ago in 584941749 years 6a964c0f3d653ecd9eded97c161d583dd42493d16848aa856f9aa09774ca2fe3 0 1 11 hours ago in 584941749 years 6b514421ce47003add9026df88ad7889f9ea0e6fb939d07b9696ccc6f5410671 0 1 9 hours ago in 584941749 years 716a12f087cc3bc6828e2f42c774076d7fb8ef67de0465104caa8eeda68eadfc 0 1 11 hours ago in 584941749 years 78b640220f90fe8834dd6f38c97978f7b39dd5f3df72783d1d8edc7ac40c4702 0 1 1 hour ago in 584941749 years 7b80f6200acb0815d226bddf45238737a00b63ee97b9663085547a8b7666fa3f 0 1 57 minutes ago in 584941749 years 7be5458a4ab45bb9bf18dba6d68d57f74b7f71d01d67252ebe4db06abed0e739 0 1 51 minutes ago in 584941749 years 8e7582c60b92a7ddebe90bb450a8dd4c68b7462bd696e9b31fdbe6ce7e86ded0 0 1 12 hours ago in 584941749 years 8e9aeaf18a7707cdafa2e59504ba1490f4b614668553059bf25c89b9899282e9 0 1 5 hours ago in 584941749 years 8fe29f09887dadfef9ed1bc212effa4ef1d199c4e8da51453f92ff23a46f7f87 0 1 4 hours ago in 584941749 years 906d8da08a1de987a05091c7af4e99fe5171ad76b087f1e653b895d17acb91b1 0 1 4 hours ago in 584941749 years 98df6eb6286acb6241d7a63322ad144132e97f95d317592b499bdef0ed935519 0 1 8 hours ago in 584941749 years 9dc2c5e31de825d731b8277ab42130e947c72c76bc681758daadb0b09eb270ce 0 2 9 hours ago in 584941749 years a0a2f41e197b0b1466c716a99bd4f45f24c6d78779c75d38e82e0c6cb5e3a318 0 72 6 hours ago in 584941749 years a35acb2f1b40f19bde45ade63932346e182cf113bd565b0cf70a60c007843fd5 0 1 36 minutes ago in 584941749 years a39a3475bcdc59146fae9bb5deac2dcf2b5ec697df25ad1cb8036a77fe74e089 0 1 13 hours ago in 584941749 years a74bb2ec4b231e56ebeb9583cadd3bb80329b069cde9a26b4f61ed06de8c8bcd 0 1 7 hours ago in 584941749 years b038632359a3da7b9bbff52627e9f96903dfed132176fe2e1699efb51800e147 0 1 18 hours ago in 584941749 years b068f8a4072f418e9fda7336d98f920b692d06fd7af71803f4071599b3e7f7d6 0 1 17 hours ago in 584941749 years b550dc7e479a4deb6851b452b74af69838f92dfe36340d5f4f0a5ccb1d922034 0 1 3 hours ago in 584941749 years b584a99713b90b94266778d83624c923e9fe185eddcfec3710d988a9d7573f61 0 1 12 hours ago in 584941749 years c2488173219a672d9e457be63864dba0289e91f47889f4d290eb00624460d25b 0 1 7 hours ago in 584941749 years c36ab02ea946986e4f72c9e779c9bed41d0dded4edf387dad4a00ae908593be6 0 1 14 hours ago in 584941749 years c80fae07d2f173f520d1e05e2ccb3d2f84105098f7586a676ffd906dc1017a9a 0 1 10 hours ago in 584941749 years c86eb442460b72cae749695c2b7dcdb06c10d1f7348b6266a44d1b28719433be 0 2 6 hours ago in 584941749 years c889ab9d9fc3b52dc3dad78f1cd6b7c18110fbf6a9b28105dd3d5dba68384b48 0 2 15 hours ago in 584941749 years c8d1f9df1b786af50da682879a9a4773f542ca29a5b24681e3434340dbfb914d 0 1 23 minutes ago in 584941749 years d227c101db7cd453eb8af37ca772e05e4fa3dd7226211d66a0592eb0687956ba 0 1 16 hours ago in 584941749 years d2c527bff5278da24be0fabe64a721c7d05c5f3100762a362670f7a47db4dfc3 0 1 12 hours ago in 584941749 years d86e69065c8dcd5515b1194dc7b7492c37b903c750e350dac129c0004f4a484b 0 1 16 hours ago in 584941749 years d93448c1f9da56018347f8d61bcfa6a53ce293ca95b68a11e5f3ddbb773eeeff 0 1 5 hours ago in 584941749 years d9f42c495b4b826a8aaa55c590fc6d53048ce2218e53b2d005b26ae7a3931289 0 1 8 hours ago in 584941749 years dc208fd3ccb3194d310088d62f4cca24e9ecad71a0b2a8112e9db810a1b3d6e5 0 1 18 hours ago in 584941749 years df9c89e927e2b335b1afc8a15940df71d37afb95a54586227a4b00b545be6a3a 0 1 16 hours ago in 584941749 years dfc8bc72b816646da70ba6aa0a57e14fef639f0a47c04412b2307d2e03f31b48 0 1 8 hours ago in 584941749 years e552edd8d3589bd0c525b63f0af6d33005a9d98f5f6151d3c6117ad68ebffb8b 0 1 3 hours ago in 584941749 years f39fe962b7b9bdd141198897b58cdd7768a52b367562420c48228bc156eb991c 0 1 12 hours ago in 584941749 years ffcba761a5c034a45783e4de18bf9ccce25d4e5f86dd2f1a1d2559855a7cd0aa 0 1 9 hours ago in 584941749 years ``` These blocks are gone completely, as far as I can tell. I had a node that was running netdata (https://github.com/netdata/helmchart/issues/372) which somehow consumed __all__ file descriptors on the node. This is about half of the errors, I've been manually doing `garage block purge` one-by-one since `garage block list errors | awk '{print $1;}' | xargs garage block purge --yes` doesn't seem to work.

withinboredom commented

2023-07-11 19:21:20 +00:00

Author

Contributor

This is what the garage logs look like during this time:

Jul 09 23:42:49 cameo garage[1036]: 2023-07-09T23:42:49.523702Z  INFO garage_table::sync: (bucket_object_counter) Sending 1 items to b9a421e6ef5a3ee1
Jul 09 23:42:49 cameo garage[1036]: 2023-07-09T23:42:49.548386Z  INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 29f3b149599f324e
Jul 09 23:42:49 cameo garage[1036]: 2023-07-09T23:42:49.571618Z  INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 6f9edc9a20c362d0
Jul 09 23:43:24 cameo garage[1036]: 2023-07-09T23:43:24.618297Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:44:24 cameo garage[1036]: 2023-07-09T23:44:24.619424Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:45:24 cameo garage[1036]: 2023-07-09T23:45:24.620860Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:46:24 cameo garage[1036]: 2023-07-09T23:46:24.622164Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:47:24 cameo garage[1036]: 2023-07-09T23:47:24.623065Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:48:24 cameo garage[1036]: 2023-07-09T23:48:24.624144Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:49:24 cameo garage[1036]: 2023-07-09T23:49:24.625060Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:50:24 cameo garage[1036]: 2023-07-09T23:50:24.626600Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:51:24 cameo garage[1036]: 2023-07-09T23:51:24.627282Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:52:24 cameo garage[1036]: 2023-07-09T23:52:24.628921Z  WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24)
Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.427567Z  INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 7bc582cfe6c98d31
Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.499709Z  INFO garage_table::sync: (bucket_object_counter) Sending 1 items to b9a421e6ef5a3ee1
Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.524725Z  INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 29f3b149599f324e
Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.548061Z  INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 6f9edc9a20c362d0

This is what the `garage` logs look like during this time: ``` Jul 09 23:42:49 cameo garage[1036]: 2023-07-09T23:42:49.523702Z INFO garage_table::sync: (bucket_object_counter) Sending 1 items to b9a421e6ef5a3ee1 Jul 09 23:42:49 cameo garage[1036]: 2023-07-09T23:42:49.548386Z INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 29f3b149599f324e Jul 09 23:42:49 cameo garage[1036]: 2023-07-09T23:42:49.571618Z INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 6f9edc9a20c362d0 Jul 09 23:43:24 cameo garage[1036]: 2023-07-09T23:43:24.618297Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:44:24 cameo garage[1036]: 2023-07-09T23:44:24.619424Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:45:24 cameo garage[1036]: 2023-07-09T23:45:24.620860Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:46:24 cameo garage[1036]: 2023-07-09T23:46:24.622164Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:47:24 cameo garage[1036]: 2023-07-09T23:47:24.623065Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:48:24 cameo garage[1036]: 2023-07-09T23:48:24.624144Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:49:24 cameo garage[1036]: 2023-07-09T23:49:24.625060Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:50:24 cameo garage[1036]: 2023-07-09T23:50:24.626600Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:51:24 cameo garage[1036]: 2023-07-09T23:51:24.627282Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:52:24 cameo garage[1036]: 2023-07-09T23:52:24.628921Z WARN garage_rpc::system: Could not save peer list to file: IO error: No file descriptors available (os error 24) Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.427567Z INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 7bc582cfe6c98d31 Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.499709Z INFO garage_table::sync: (bucket_object_counter) Sending 1 items to b9a421e6ef5a3ee1 Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.524725Z INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 29f3b149599f324e Jul 09 23:52:49 cameo garage[1036]: 2023-07-09T23:52:49.548061Z INFO garage_table::sync: (bucket_object_counter) Sending 1 items to 6f9edc9a20c362d0 ```

withinboredom commented

2023-07-11 19:22:49 +00:00

Author

Contributor

And how I guessed that it was causing corruption:

Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.377886Z  INFO garage_block::resync: Resync block 366876f935fa2911: fetching absent but needed block (refcount > 0)
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.402837Z ERROR garage_block::resync: Error when resyncing 366876f935fa2911: Unable to read block 366876f935fa2911: no node returned a valid block
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.549660Z  INFO garage_block::resync: Resync block 2442fb97c73d9e35: fetching absent but needed block (refcount > 0)
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.574439Z ERROR garage_block::resync: Error when resyncing 2442fb97c73d9e35: Unable to read block 2442fb97c73d9e35: no node returned a valid block
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.668858Z  INFO garage_block::resync: Resync block ac76df0231d5f800: fetching absent but needed block (refcount > 0)
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.717681Z ERROR garage_block::resync: Error when resyncing ac76df0231d5f800: Unable to read block ac76df0231d5f800: no node returned a valid block
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.808150Z  INFO garage_block::resync: Resync block 047dd4f58bafff43: fetching absent but needed block (refcount > 0)
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.833259Z ERROR garage_block::resync: Error when resyncing 047dd4f58bafff43: Unable to read block 047dd4f58bafff43: no node returned a valid block
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.957784Z  INFO garage_block::resync: Resync block a0a2f41e197b0b14: fetching absent but needed block (refcount > 0)
Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.982540Z ERROR garage_block::resync: Error when resyncing a0a2f41e197b0b14: Unable to read block a0a2f41e197b0b14: no node returned a valid block
Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.103153Z  INFO garage_block::resync: Resync block 11ef351ba6b7c4cf: fetching absent but needed block (refcount > 0)
Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.127992Z ERROR garage_block::resync: Error when resyncing 11ef351ba6b7c4cf: Unable to read block 11ef351ba6b7c4cf: no node returned a valid block
Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.905381Z  INFO garage_block::resync: Resync block 2d6b989158790d30: fetching absent but needed block (refcount > 0)
Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.944993Z ERROR garage_block::resync: Error when resyncing 2d6b989158790d30: Unable to read block 2d6b989158790d30: no node returned a valid block
Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.042587Z  INFO garage_block::resync: Resync block 0477d53d49edad6b: fetching absent but needed block (refcount > 0)
Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.067948Z ERROR garage_block::resync: Error when resyncing 0477d53d49edad6b: Unable to read block 0477d53d49edad6b: no node returned a valid block
Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.196248Z  INFO garage_block::resync: Resync block 0e9b4fe0aa56d6c1: fetching absent but needed block (refcount > 0)
Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.286883Z ERROR garage_block::resync: Error when resyncing 0e9b4fe0aa56d6c1: Unable to read block 0e9b4fe0aa56d6c1: no node returned a valid block
Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.352326Z  INFO garage_block::resync: Resync block 4e14a344304a04b7: fetching absent but needed block (refcount > 0)
Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.376922Z ERROR garage_block::resync: Error when resyncing 4e14a344304a04b7: Unable to read block 4e14a344304a04b7: no node returned a valid block

And how I guessed that it was causing corruption: ``` Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.377886Z INFO garage_block::resync: Resync block 366876f935fa2911: fetching absent but needed block (refcount > 0) Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.402837Z ERROR garage_block::resync: Error when resyncing 366876f935fa2911: Unable to read block 366876f935fa2911: no node returned a valid block Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.549660Z INFO garage_block::resync: Resync block 2442fb97c73d9e35: fetching absent but needed block (refcount > 0) Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.574439Z ERROR garage_block::resync: Error when resyncing 2442fb97c73d9e35: Unable to read block 2442fb97c73d9e35: no node returned a valid block Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.668858Z INFO garage_block::resync: Resync block ac76df0231d5f800: fetching absent but needed block (refcount > 0) Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.717681Z ERROR garage_block::resync: Error when resyncing ac76df0231d5f800: Unable to read block ac76df0231d5f800: no node returned a valid block Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.808150Z INFO garage_block::resync: Resync block 047dd4f58bafff43: fetching absent but needed block (refcount > 0) Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.833259Z ERROR garage_block::resync: Error when resyncing 047dd4f58bafff43: Unable to read block 047dd4f58bafff43: no node returned a valid block Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.957784Z INFO garage_block::resync: Resync block a0a2f41e197b0b14: fetching absent but needed block (refcount > 0) Jul 10 00:21:48 cameo garage[1036]: 2023-07-10T00:21:48.982540Z ERROR garage_block::resync: Error when resyncing a0a2f41e197b0b14: Unable to read block a0a2f41e197b0b14: no node returned a valid block Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.103153Z INFO garage_block::resync: Resync block 11ef351ba6b7c4cf: fetching absent but needed block (refcount > 0) Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.127992Z ERROR garage_block::resync: Error when resyncing 11ef351ba6b7c4cf: Unable to read block 11ef351ba6b7c4cf: no node returned a valid block Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.905381Z INFO garage_block::resync: Resync block 2d6b989158790d30: fetching absent but needed block (refcount > 0) Jul 10 00:21:49 cameo garage[1036]: 2023-07-10T00:21:49.944993Z ERROR garage_block::resync: Error when resyncing 2d6b989158790d30: Unable to read block 2d6b989158790d30: no node returned a valid block Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.042587Z INFO garage_block::resync: Resync block 0477d53d49edad6b: fetching absent but needed block (refcount > 0) Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.067948Z ERROR garage_block::resync: Error when resyncing 0477d53d49edad6b: Unable to read block 0477d53d49edad6b: no node returned a valid block Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.196248Z INFO garage_block::resync: Resync block 0e9b4fe0aa56d6c1: fetching absent but needed block (refcount > 0) Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.286883Z ERROR garage_block::resync: Error when resyncing 0e9b4fe0aa56d6c1: Unable to read block 0e9b4fe0aa56d6c1: no node returned a valid block Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.352326Z INFO garage_block::resync: Resync block 4e14a344304a04b7: fetching absent but needed block (refcount > 0) Jul 10 00:21:50 cameo garage[1036]: 2023-07-10T00:21:50.376922Z ERROR garage_block::resync: Error when resyncing 4e14a344304a04b7: Unable to read block 4e14a344304a04b7: no node returned a valid block ```

withinboredom commented

2023-07-11 19:26:43 +00:00

Author

Contributor

It's worth pointing out that it took days for any serious issues to start happening (we had zero monitoring for this failure case), and in this case, it was containers not coming up that had just been pushed (Harbor is using garage as a backend) successfully. So, garage was actually quite resilient in the face of catastrophic error conditions. Perhaps too resilient?

Happy to provide the entire logs for your purusal.

It's worth pointing out that it took __days__ for any serious issues to start happening (we had __zero__ monitoring for this failure case), and in this case, it was containers not coming up that had just been pushed (Harbor is using `garage` as a backend) successfully. So, garage was actually quite resilient in the face of catastrophic error conditions. Perhaps too resilient? Happy to provide the entire logs for your purusal.

trinity-1686a commented

2023-07-11 20:10:03 +00:00

Owner

Could you provide us with your garage.toml (minus rpc secret) as well as logs? The logs you provided don't show any block getting written, "just" the inability to save the peer list, which isn't too bad on its own, and the inability to read blocks, which sounds like it's a consequence, not a cause.

As lx said, the scenario you describe shouldn't happen, it's an io error like any other, and should be reported as is. Do you have traces that the uploads went correctly, and did not receive 4xx/5xx error codes in response? Some metadata are saved in parallel to data, so if saving some data fails, there can still be some metadata saying the block should exist (which would probably generate this kind of message), but the client would still have been informed of an error.

Could you provide us with your `garage.toml` (minus rpc secret) as well as logs? The logs you provided don't show any block getting written, "just" the inability to save the peer list, which isn't too bad on its own, and the inability to read blocks, which sounds like it's a consequence, not a cause. As lx said, the scenario you describe shouldn't happen, it's an io error like any other, and should be reported as is. Do you have traces that the uploads went correctly, and did not receive 4xx/5xx error codes in response? Some metadata are saved in parallel to data, so if saving some data fails, there can still be some metadata saying the block should exist (which would probably generate this kind of message), but the client would still have been informed of an error.

withinboredom commented

2023-07-11 22:08:34 +00:00

Author

Contributor

I'll get those to you.

metadata_dir = "/var/lib/garage/meta"
data_dir = "/var/lib/garage/data"

replication_mode = "3"

compression_level = 0

rpc_bind_addr = "[::]:3901"
rpc_secret = "blob"

bootstrap_peers = []

[s3_api]
s3_region = "garage"
api_bind_addr = "[::]:3900"
root_domain = ".s3.bottled.codes"

[s3_web]
bind_addr = "[::]:3902"
root_domain = ".web.bottled.codes"
index = "index.html"

[admin]
api_bind_addr = "[::]:3903"
metrics_token = "blob"
admin_token = "blob"

I'll get those to you. ``` metadata_dir = "/var/lib/garage/meta" data_dir = "/var/lib/garage/data" replication_mode = "3" compression_level = 0 rpc_bind_addr = "[::]:3901" rpc_secret = "blob" bootstrap_peers = [] [s3_api] s3_region = "garage" api_bind_addr = "[::]:3900" root_domain = ".s3.bottled.codes" [s3_web] bind_addr = "[::]:3902" root_domain = ".web.bottled.codes" index = "index.html" [admin] api_bind_addr = "[::]:3903" metrics_token = "blob" admin_token = "blob" ```

withinboredom commented

2023-07-11 22:09:18 +00:00

Author

Contributor

I think the logs were over the upload limit and they disappeared into the ether.

trinity-1686a commented

2023-07-12 08:01:00 +00:00

Owner

can you upload them somewhere else? Or compress them maybe?

withinboredom referenced this issue

2023-07-12 20:40:07 +00:00

Garage fails to count to 3? #597

Mako commented

2023-08-26 09:41:42 +00:00

The same errors...
Single node, good new disks (data on 2x16Tb in RAID1, meta on 2 NVMe SSD RAID1) without inserts over a week...

2023-08-26T08:27:43.131443Z  INFO garage_block::resync: Resync block 03e60852887800b4: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.131500Z ERROR garage_block::resync: Error when resyncing 03e60852887800b4: Unable to read block 03e60852887800b4: no node returned a valid block
2023-08-26T08:27:43.133653Z  INFO garage_block::resync: Resync block a2b22439fc24ae93: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.133718Z ERROR garage_block::resync: Error when resyncing a2b22439fc24ae93: Unable to read block a2b22439fc24ae93: no node returned a valid block
2023-08-26T08:27:43.135868Z  INFO garage_block::resync: Resync block 0bbbac6052a390b3: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.135962Z ERROR garage_block::resync: Error when resyncing 0bbbac6052a390b3: Unable to read block 0bbbac6052a390b3: no node returned a valid block
2023-08-26T08:27:43.143213Z  INFO garage_block::resync: Resync block 39154c1e61dafbf4: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.143284Z ERROR garage_block::resync: Error when resyncing 39154c1e61dafbf4: Unable to read block 39154c1e61dafbf4: no node returned a valid block
2023-08-26T08:27:43.144448Z  INFO garage_block::resync: Resync block 3feac0a8e13e4f19: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.144511Z ERROR garage_block::resync: Error when resyncing 3feac0a8e13e4f19: Unable to read block 3feac0a8e13e4f19: no node returned a valid block
2023-08-26T08:27:43.146661Z  INFO garage_block::resync: Resync block 7ea68b1c71eef1f2: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.146727Z ERROR garage_block::resync: Error when resyncing 7ea68b1c71eef1f2: Unable to read block 7ea68b1c71eef1f2: no node returned a valid block
2023-08-26T08:27:43.148886Z  INFO garage_block::resync: Resync block 88b3bfa652fce478: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.148949Z ERROR garage_block::resync: Error when resyncing 88b3bfa652fce478: Unable to read block 88b3bfa652fce478: no node returned a valid block
2023-08-26T08:27:43.153210Z  INFO garage_block::resync: Resync block 870affb71f50a669: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.153282Z ERROR garage_block::resync: Error when resyncing 870affb71f50a669: Unable to read block 870affb71f50a669: no node returned a valid block
2023-08-26T08:27:43.154437Z  INFO garage_block::resync: Resync block f260dbb6304f32a7: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.154503Z ERROR garage_block::resync: Error when resyncing f260dbb6304f32a7: Unable to read block f260dbb6304f32a7: no node returned a valid block
2023-08-26T08:27:43.155641Z  INFO garage_block::resync: Resync block 6be82ca5a961f0d3: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.155699Z ERROR garage_block::resync: Error when resyncing 6be82ca5a961f0d3: Unable to read block 6be82ca5a961f0d3: no node returned a valid block
2023-08-26T08:27:43.157917Z  INFO garage_block::resync: Resync block 49b01e292290292d: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.157977Z ERROR garage_block::resync: Error when resyncing 49b01e292290292d: Unable to read block 49b01e292290292d: no node returned a valid block
2023-08-26T08:27:43.159134Z  INFO garage_block::resync: Resync block ad36ab8d7e895654: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.159229Z ERROR garage_block::resync: Error when resyncing ad36ab8d7e895654: Unable to read block ad36ab8d7e895654: no node returned a valid block
2023-08-26T08:27:43.160413Z  INFO garage_block::resync: Resync block d7b9adf57be250ea: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.160470Z ERROR garage_block::resync: Error when resyncing d7b9adf57be250ea: Unable to read block d7b9adf57be250ea: no node returned a valid block
2023-08-26T08:27:43.165694Z  INFO garage_block::resync: Resync block 5c4af431ea22af13: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.165766Z ERROR garage_block::resync: Error when resyncing 5c4af431ea22af13: Unable to read block 5c4af431ea22af13: no node returned a valid block
2023-08-26T08:27:43.167924Z  INFO garage_block::resync: Resync block 049f63941ae2f2eb: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.167987Z ERROR garage_block::resync: Error when resyncing 049f63941ae2f2eb: Unable to read block 049f63941ae2f2eb: no node returned a valid block
2023-08-26T08:27:43.169128Z  INFO garage_block::resync: Resync block 64a738b26c84fcb1: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.169195Z ERROR garage_block::resync: Error when resyncing 64a738b26c84fcb1: Unable to read block 64a738b26c84fcb1: no node returned a valid block
2023-08-26T08:27:43.170351Z  INFO garage_block::resync: Resync block 81203b27d4a15668: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.170415Z ERROR garage_block::resync: Error when resyncing 81203b27d4a15668: Unable to read block 81203b27d4a15668: no node returned a valid block
2023-08-26T08:27:43.171569Z  INFO garage_block::resync: Resync block 3188eda30bab2a61: fetching absent but needed block (refcount > 0)
2023-08-26T08:27:43.171632Z ERROR garage_block::resync: Error when resyncing 3188eda30bab2a61: Unable to read block 3188eda30bab2a61: no node returned a valid block
2023-08-26T08:30:23.814970Z  INFO garage_block::resync: Resync block c96644413847497c: fetching absent but needed block (refcount > 0)
2023-08-26T08:30:23.815087Z ERROR garage_block::resync: Error when resyncing c96644413847497c: Unable to read block c96644413847497c: no node returned a valid block
2023-08-26T08:37:10.718735Z  INFO garage_block::resync: Resync block 8876976053689f41: fetching absent but needed block (refcount > 0)
2023-08-26T08:37:10.718861Z ERROR garage_block::resync: Error when resyncing 8876976053689f41: Unable to read block 8876976053689f41: no node returned a valid block
2023-08-26T08:43:39.130756Z  INFO garage_block::resync: Resync block db3523749f5ce12c: fetching absent but needed block (refcount > 0)
2023-08-26T08:43:39.130869Z ERROR garage_block::resync: Error when resyncing db3523749f5ce12c: Unable to read block db3523749f5ce12c: no node returned a valid block
2023-08-26T08:49:50.641284Z  INFO garage_block::resync: Resync block 2f42f54ed7ffbc33: fetching absent but needed block (refcount > 0)
2023-08-26T08:49:50.641400Z ERROR garage_block::resync: Error when resyncing 2f42f54ed7ffbc33: Unable to read block 2f42f54ed7ffbc33: no node returned a valid block
2023-08-26T08:55:46.676720Z  INFO garage_block::resync: Resync block ddc676861abc39de: fetching absent but needed block (refcount > 0)
2023-08-26T08:55:46.676834Z ERROR garage_block::resync: Error when resyncing ddc676861abc39de: Unable to read block ddc676861abc39de: no node returned a valid block
2023-08-26T09:01:28.476463Z  INFO garage_block::resync: Resync block cfc6e2abf833ed54: fetching absent but needed block (refcount > 0)
2023-08-26T09:01:28.476589Z ERROR garage_block::resync: Error when resyncing cfc6e2abf833ed54: Unable to read block cfc6e2abf833ed54: no node returned a valid block

The same errors... Single node, good new disks (data on 2x16Tb in RAID1, meta on 2 NVMe SSD RAID1) without inserts over a week... ``` 2023-08-26T08:27:43.131443Z INFO garage_block::resync: Resync block 03e60852887800b4: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.131500Z ERROR garage_block::resync: Error when resyncing 03e60852887800b4: Unable to read block 03e60852887800b4: no node returned a valid block 2023-08-26T08:27:43.133653Z INFO garage_block::resync: Resync block a2b22439fc24ae93: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.133718Z ERROR garage_block::resync: Error when resyncing a2b22439fc24ae93: Unable to read block a2b22439fc24ae93: no node returned a valid block 2023-08-26T08:27:43.135868Z INFO garage_block::resync: Resync block 0bbbac6052a390b3: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.135962Z ERROR garage_block::resync: Error when resyncing 0bbbac6052a390b3: Unable to read block 0bbbac6052a390b3: no node returned a valid block 2023-08-26T08:27:43.143213Z INFO garage_block::resync: Resync block 39154c1e61dafbf4: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.143284Z ERROR garage_block::resync: Error when resyncing 39154c1e61dafbf4: Unable to read block 39154c1e61dafbf4: no node returned a valid block 2023-08-26T08:27:43.144448Z INFO garage_block::resync: Resync block 3feac0a8e13e4f19: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.144511Z ERROR garage_block::resync: Error when resyncing 3feac0a8e13e4f19: Unable to read block 3feac0a8e13e4f19: no node returned a valid block 2023-08-26T08:27:43.146661Z INFO garage_block::resync: Resync block 7ea68b1c71eef1f2: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.146727Z ERROR garage_block::resync: Error when resyncing 7ea68b1c71eef1f2: Unable to read block 7ea68b1c71eef1f2: no node returned a valid block 2023-08-26T08:27:43.148886Z INFO garage_block::resync: Resync block 88b3bfa652fce478: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.148949Z ERROR garage_block::resync: Error when resyncing 88b3bfa652fce478: Unable to read block 88b3bfa652fce478: no node returned a valid block 2023-08-26T08:27:43.153210Z INFO garage_block::resync: Resync block 870affb71f50a669: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.153282Z ERROR garage_block::resync: Error when resyncing 870affb71f50a669: Unable to read block 870affb71f50a669: no node returned a valid block 2023-08-26T08:27:43.154437Z INFO garage_block::resync: Resync block f260dbb6304f32a7: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.154503Z ERROR garage_block::resync: Error when resyncing f260dbb6304f32a7: Unable to read block f260dbb6304f32a7: no node returned a valid block 2023-08-26T08:27:43.155641Z INFO garage_block::resync: Resync block 6be82ca5a961f0d3: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.155699Z ERROR garage_block::resync: Error when resyncing 6be82ca5a961f0d3: Unable to read block 6be82ca5a961f0d3: no node returned a valid block 2023-08-26T08:27:43.157917Z INFO garage_block::resync: Resync block 49b01e292290292d: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.157977Z ERROR garage_block::resync: Error when resyncing 49b01e292290292d: Unable to read block 49b01e292290292d: no node returned a valid block 2023-08-26T08:27:43.159134Z INFO garage_block::resync: Resync block ad36ab8d7e895654: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.159229Z ERROR garage_block::resync: Error when resyncing ad36ab8d7e895654: Unable to read block ad36ab8d7e895654: no node returned a valid block 2023-08-26T08:27:43.160413Z INFO garage_block::resync: Resync block d7b9adf57be250ea: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.160470Z ERROR garage_block::resync: Error when resyncing d7b9adf57be250ea: Unable to read block d7b9adf57be250ea: no node returned a valid block 2023-08-26T08:27:43.165694Z INFO garage_block::resync: Resync block 5c4af431ea22af13: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.165766Z ERROR garage_block::resync: Error when resyncing 5c4af431ea22af13: Unable to read block 5c4af431ea22af13: no node returned a valid block 2023-08-26T08:27:43.167924Z INFO garage_block::resync: Resync block 049f63941ae2f2eb: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.167987Z ERROR garage_block::resync: Error when resyncing 049f63941ae2f2eb: Unable to read block 049f63941ae2f2eb: no node returned a valid block 2023-08-26T08:27:43.169128Z INFO garage_block::resync: Resync block 64a738b26c84fcb1: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.169195Z ERROR garage_block::resync: Error when resyncing 64a738b26c84fcb1: Unable to read block 64a738b26c84fcb1: no node returned a valid block 2023-08-26T08:27:43.170351Z INFO garage_block::resync: Resync block 81203b27d4a15668: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.170415Z ERROR garage_block::resync: Error when resyncing 81203b27d4a15668: Unable to read block 81203b27d4a15668: no node returned a valid block 2023-08-26T08:27:43.171569Z INFO garage_block::resync: Resync block 3188eda30bab2a61: fetching absent but needed block (refcount > 0) 2023-08-26T08:27:43.171632Z ERROR garage_block::resync: Error when resyncing 3188eda30bab2a61: Unable to read block 3188eda30bab2a61: no node returned a valid block 2023-08-26T08:30:23.814970Z INFO garage_block::resync: Resync block c96644413847497c: fetching absent but needed block (refcount > 0) 2023-08-26T08:30:23.815087Z ERROR garage_block::resync: Error when resyncing c96644413847497c: Unable to read block c96644413847497c: no node returned a valid block 2023-08-26T08:37:10.718735Z INFO garage_block::resync: Resync block 8876976053689f41: fetching absent but needed block (refcount > 0) 2023-08-26T08:37:10.718861Z ERROR garage_block::resync: Error when resyncing 8876976053689f41: Unable to read block 8876976053689f41: no node returned a valid block 2023-08-26T08:43:39.130756Z INFO garage_block::resync: Resync block db3523749f5ce12c: fetching absent but needed block (refcount > 0) 2023-08-26T08:43:39.130869Z ERROR garage_block::resync: Error when resyncing db3523749f5ce12c: Unable to read block db3523749f5ce12c: no node returned a valid block 2023-08-26T08:49:50.641284Z INFO garage_block::resync: Resync block 2f42f54ed7ffbc33: fetching absent but needed block (refcount > 0) 2023-08-26T08:49:50.641400Z ERROR garage_block::resync: Error when resyncing 2f42f54ed7ffbc33: Unable to read block 2f42f54ed7ffbc33: no node returned a valid block 2023-08-26T08:55:46.676720Z INFO garage_block::resync: Resync block ddc676861abc39de: fetching absent but needed block (refcount > 0) 2023-08-26T08:55:46.676834Z ERROR garage_block::resync: Error when resyncing ddc676861abc39de: Unable to read block ddc676861abc39de: no node returned a valid block 2023-08-26T09:01:28.476463Z INFO garage_block::resync: Resync block cfc6e2abf833ed54: fetching absent but needed block (refcount > 0) 2023-08-26T09:01:28.476589Z ERROR garage_block::resync: Error when resyncing cfc6e2abf833ed54: Unable to read block cfc6e2abf833ed54: no node returned a valid block ```

Mako commented

2023-08-26 09:47:02 +00:00

Hash                                                              RC  Errors  Last error      Next try
03e60852887800b4d8216f427cb801dcb65f05c170767be71a9cacf03599c124  1   2909    11 minutes ago  in 52 minutes
049f63941ae2f2ebaa47afd61c87266f91f01bd26b0faa2d31092ae5de9ef629  1   2908    11 minutes ago  in 52 minutes
06c704eb22d67b2576e124c5fc74ccee00b623097772547798df8865d6666003  1   2908    11 minutes ago  in 52 minutes
1c90733282dba33da47b753897bfa953aa97c9f6c5d0ec8a29c73fe955a848f3  1   3829    26 minutes ago  in 37 minutes
2c832e3eb45258b18f3dfd70b85380e9fcf75cd697d01b8c337a254e8831d0c4  1   2908    11 minutes ago  in 52 minutes
2d7e2fccb679e4fc6006d41133417d5d84efa365e69af71a4131edfb68e55190  1   2908    11 minutes ago  in 52 minutes
2f42f54ed7ffbc3331a3dff390e1e3e38467e0f16f1160e2fce49c0cf73657bd  1   2908    53 minutes ago  in 10 minutes
3188eda30bab2a61d59ca7c0244b4302207d8378d380be190c2f62e10fd18d4e  1   2908    11 minutes ago  in 52 minutes
32c228e26e3de57239dcc16c6912b48d23c2bf5946a4eeccdde9abb9bcacaabc  1   3769    11 minutes ago  in 52 minutes
39154c1e61dafbf4cc64b2a4c8da30ac733d3d6f6b9845ed66d12e318450037a  1   2908    11 minutes ago  in 52 minutes
3a45833af2d72767a8ff8f4c821c312580769ab56cd639ce275dbe324ecf6b01  1   2908    11 minutes ago  in 52 minutes
3feac0a8e13e4f19ce51af6b365ce00f336685d455a0e22641b58db4c0df74f7  1   2908    11 minutes ago  in 52 minutes
44152b4f8788813914dd195a03b24134f61f5f3ce61352f7a610c2c2e75c01a1  1   2908    11 minutes ago  in 52 minutes
49b01e292290292d3c0b0b89e0b0b7eb1f1b7fba7b077fba4c71448feea5c4b6  1   2908    11 minutes ago  in 52 minutes
5308dd6c5055b95f3f0ce6c244283c3ab8b5972fbaee6f4c36f17776739e6fad  1   3770    11 minutes ago  in 52 minutes
549b1c2acb0a71a3269fa97b891644b9e3f28baafaea178f6df87d0f85aa6fcc  1   2908    36 minutes ago  in 27 minutes
55eab207318c291d0a1d41a8ca79bb32c9fc9280e2e8d18a130bba729fe36251  1   3774    11 minutes ago  in 52 minutes
5c4af431ea22af130d3f6e0930a808015915c9cb26ad019d1d7476c371bfc5a4  1   2908    11 minutes ago  in 52 minutes
64a738b26c84fcb100e24a3319df14cfb12845ee2c4833c7a4c52dc19bd626d0  1   2908    11 minutes ago  in 52 minutes
69d5e4e465875d628b79c5ebffda5430c9dc26a92f3f9bc5915fddc1f15bc978  1   2908    11 minutes ago  in 52 minutes
6be82ca5a961f0d3fb05fe43e0967d7602611c046eb7ff8d3a431b31056c5bff  1   2908    11 minutes ago  in 52 minutes
76e3fac17fd7b6ea20b656c5fae822a054f4e33d0e790a52eaca12c5dacceb61  1   2908    11 minutes ago  in 52 minutes
7ea68b1c71eef1f219ab5a981928386f7124c259fd5973756537f3b23b38b024  1   2908    11 minutes ago  in 52 minutes
81203b27d4a1566827e318a2087ab70a0a7534bb09e566f11c611c46377c4a43  1   2908    11 minutes ago  in 52 minutes
870affb71f50a66975ab3c0650369c440eb8bdf31a7a48f16210c22d4273cca0  1   2908    11 minutes ago  in 52 minutes
8876976053689f41726f258b2f6c28f32e21905d342c32ad7db5ca9602a05e02  1   2936    2 minutes ago   in 1 hour
88b3bfa652fce4789c90c3b0236d5629e4e04aa01a7e6ab4e69f602b9690cbda  1   2908    11 minutes ago  in 52 minutes
89c945f0383067e4127c44c89508ac3d0e50233623e53bc7fa6eda026d591d64  1   2908    11 minutes ago  in 52 minutes
91aa47370b0b8c222b9cbaebdbc81fb9aff5e697e130b4929a0562fc01fc46a6  1   2908    11 minutes ago  in 52 minutes
a2b22439fc24ae9375dcd70df6e41edb2cbc82e07820a4876566d33185b70db9  1   2909    11 minutes ago  in 52 minutes
a4d594ca67797a28b41cc971bd3bb9626224da4363f4dd251ce0fdb7c811149b  1   2908    11 minutes ago  in 52 minutes
ad36ab8d7e89565430900a7594d6a22f655baf61a1ae991546253217ab22f1cb  1   2908    11 minutes ago  in 52 minutes
c96644413847497c322d93a1d0fd7ce2b5ccf7e0a1c7b2107cf21739f65060cd  1   2900    9 minutes ago   in 54 minutes
cf9fb8a0c30d5f09f17526fbb39ab6edaa9018372c2d7a6951537b834b95fbe9  1   8332    21 minutes ago  in 42 minutes
cfc6e2abf833ed54bb2e6eedcedcefd7d25bd39187ff33da6b6070f94a19121e  1   2908    42 minutes ago  in 21 minutes
d771e05b159ee2dbaa3a4cace33da3c8d5bbe596f5a4338dbe0edc88d97818d2  1   3843    16 minutes ago  in 47 minutes
d7b9adf57be250ea445cf7450e3d8bfdeb0deee11116d79e6482a37143b4ccf0  1   2908    11 minutes ago  in 52 minutes
d9d987e8c52c7169a733d0e96bf0ab0fb1d420b3a467eec74fa8d49409099c17  1   3843    11 minutes ago  in 52 minutes
db3523749f5ce12c5353575d8462c39614123c3a13f8674d1e036d5ef8a391ee  1   2899    1 hour ago      in 3 minutes
ddc676861abc39de6baa94143aa15f8a22b39f0f9511896e6cffa2117b09664f  1   2908    47 minutes ago  in 16 minutes
e63f393573ab2ca9e2cdf78c37b9ae50bb742eaa6273ad5a2794b706661c8836  1   3778    11 minutes ago  in 52 minutes
e7960637f29057097102b8c0a09923dd139039a669fbf6512b43e77663dd63cb  1   3420    11 minutes ago  in 52 minutes
f260dbb6304f32a71ea6967eac1c78c64441f2133b165d772ab1d312e943813e  1   2908    11 minutes ago  in 52 minutes
fa91c09f33a866b1de098ec14a6be13502172573e70b510bc65c95439796390f  1   2908    11 minutes ago  in 52 minutes
fe4de3c960d76395ab71e830c6d1094acc4691231d490315fb17295906131708  1   3853    11 minutes ago  in 52 minutes
feded4e5ce1a4e29258d15fbce2ee410b2722c03f86e4c8a8cac9fbadb769aed  1   2908    11 minutes ago  in 52 minutes

``` Hash RC Errors Last error Next try 03e60852887800b4d8216f427cb801dcb65f05c170767be71a9cacf03599c124 1 2909 11 minutes ago in 52 minutes 049f63941ae2f2ebaa47afd61c87266f91f01bd26b0faa2d31092ae5de9ef629 1 2908 11 minutes ago in 52 minutes 06c704eb22d67b2576e124c5fc74ccee00b623097772547798df8865d6666003 1 2908 11 minutes ago in 52 minutes 1c90733282dba33da47b753897bfa953aa97c9f6c5d0ec8a29c73fe955a848f3 1 3829 26 minutes ago in 37 minutes 2c832e3eb45258b18f3dfd70b85380e9fcf75cd697d01b8c337a254e8831d0c4 1 2908 11 minutes ago in 52 minutes 2d7e2fccb679e4fc6006d41133417d5d84efa365e69af71a4131edfb68e55190 1 2908 11 minutes ago in 52 minutes 2f42f54ed7ffbc3331a3dff390e1e3e38467e0f16f1160e2fce49c0cf73657bd 1 2908 53 minutes ago in 10 minutes 3188eda30bab2a61d59ca7c0244b4302207d8378d380be190c2f62e10fd18d4e 1 2908 11 minutes ago in 52 minutes 32c228e26e3de57239dcc16c6912b48d23c2bf5946a4eeccdde9abb9bcacaabc 1 3769 11 minutes ago in 52 minutes 39154c1e61dafbf4cc64b2a4c8da30ac733d3d6f6b9845ed66d12e318450037a 1 2908 11 minutes ago in 52 minutes 3a45833af2d72767a8ff8f4c821c312580769ab56cd639ce275dbe324ecf6b01 1 2908 11 minutes ago in 52 minutes 3feac0a8e13e4f19ce51af6b365ce00f336685d455a0e22641b58db4c0df74f7 1 2908 11 minutes ago in 52 minutes 44152b4f8788813914dd195a03b24134f61f5f3ce61352f7a610c2c2e75c01a1 1 2908 11 minutes ago in 52 minutes 49b01e292290292d3c0b0b89e0b0b7eb1f1b7fba7b077fba4c71448feea5c4b6 1 2908 11 minutes ago in 52 minutes 5308dd6c5055b95f3f0ce6c244283c3ab8b5972fbaee6f4c36f17776739e6fad 1 3770 11 minutes ago in 52 minutes 549b1c2acb0a71a3269fa97b891644b9e3f28baafaea178f6df87d0f85aa6fcc 1 2908 36 minutes ago in 27 minutes 55eab207318c291d0a1d41a8ca79bb32c9fc9280e2e8d18a130bba729fe36251 1 3774 11 minutes ago in 52 minutes 5c4af431ea22af130d3f6e0930a808015915c9cb26ad019d1d7476c371bfc5a4 1 2908 11 minutes ago in 52 minutes 64a738b26c84fcb100e24a3319df14cfb12845ee2c4833c7a4c52dc19bd626d0 1 2908 11 minutes ago in 52 minutes 69d5e4e465875d628b79c5ebffda5430c9dc26a92f3f9bc5915fddc1f15bc978 1 2908 11 minutes ago in 52 minutes 6be82ca5a961f0d3fb05fe43e0967d7602611c046eb7ff8d3a431b31056c5bff 1 2908 11 minutes ago in 52 minutes 76e3fac17fd7b6ea20b656c5fae822a054f4e33d0e790a52eaca12c5dacceb61 1 2908 11 minutes ago in 52 minutes 7ea68b1c71eef1f219ab5a981928386f7124c259fd5973756537f3b23b38b024 1 2908 11 minutes ago in 52 minutes 81203b27d4a1566827e318a2087ab70a0a7534bb09e566f11c611c46377c4a43 1 2908 11 minutes ago in 52 minutes 870affb71f50a66975ab3c0650369c440eb8bdf31a7a48f16210c22d4273cca0 1 2908 11 minutes ago in 52 minutes 8876976053689f41726f258b2f6c28f32e21905d342c32ad7db5ca9602a05e02 1 2936 2 minutes ago in 1 hour 88b3bfa652fce4789c90c3b0236d5629e4e04aa01a7e6ab4e69f602b9690cbda 1 2908 11 minutes ago in 52 minutes 89c945f0383067e4127c44c89508ac3d0e50233623e53bc7fa6eda026d591d64 1 2908 11 minutes ago in 52 minutes 91aa47370b0b8c222b9cbaebdbc81fb9aff5e697e130b4929a0562fc01fc46a6 1 2908 11 minutes ago in 52 minutes a2b22439fc24ae9375dcd70df6e41edb2cbc82e07820a4876566d33185b70db9 1 2909 11 minutes ago in 52 minutes a4d594ca67797a28b41cc971bd3bb9626224da4363f4dd251ce0fdb7c811149b 1 2908 11 minutes ago in 52 minutes ad36ab8d7e89565430900a7594d6a22f655baf61a1ae991546253217ab22f1cb 1 2908 11 minutes ago in 52 minutes c96644413847497c322d93a1d0fd7ce2b5ccf7e0a1c7b2107cf21739f65060cd 1 2900 9 minutes ago in 54 minutes cf9fb8a0c30d5f09f17526fbb39ab6edaa9018372c2d7a6951537b834b95fbe9 1 8332 21 minutes ago in 42 minutes cfc6e2abf833ed54bb2e6eedcedcefd7d25bd39187ff33da6b6070f94a19121e 1 2908 42 minutes ago in 21 minutes d771e05b159ee2dbaa3a4cace33da3c8d5bbe596f5a4338dbe0edc88d97818d2 1 3843 16 minutes ago in 47 minutes d7b9adf57be250ea445cf7450e3d8bfdeb0deee11116d79e6482a37143b4ccf0 1 2908 11 minutes ago in 52 minutes d9d987e8c52c7169a733d0e96bf0ab0fb1d420b3a467eec74fa8d49409099c17 1 3843 11 minutes ago in 52 minutes db3523749f5ce12c5353575d8462c39614123c3a13f8674d1e036d5ef8a391ee 1 2899 1 hour ago in 3 minutes ddc676861abc39de6baa94143aa15f8a22b39f0f9511896e6cffa2117b09664f 1 2908 47 minutes ago in 16 minutes e63f393573ab2ca9e2cdf78c37b9ae50bb742eaa6273ad5a2794b706661c8836 1 3778 11 minutes ago in 52 minutes e7960637f29057097102b8c0a09923dd139039a669fbf6512b43e77663dd63cb 1 3420 11 minutes ago in 52 minutes f260dbb6304f32a71ea6967eac1c78c64441f2133b165d772ab1d312e943813e 1 2908 11 minutes ago in 52 minutes fa91c09f33a866b1de098ec14a6be13502172573e70b510bc65c95439796390f 1 2908 11 minutes ago in 52 minutes fe4de3c960d76395ab71e830c6d1094acc4691231d490315fb17295906131708 1 3853 11 minutes ago in 52 minutes feded4e5ce1a4e29258d15fbce2ee410b2722c03f86e4c8a8cac9fbadb769aed 1 2908 11 minutes ago in 52 minutes ```

Mako commented

2023-08-26 09:54:24 +00:00

metadata_dir = "/var/lib/garage/meta"
data_dir = "/mnt/raid16t/garage/data"
#db_engine = "lmdb"

replication_mode = "none"

rpc_bind_addr = "[::]:3901"
rpc_public_addr = "xxx:3901"
rpc_secret = "xxx"

bootstrap_peers = []

[s3_api]
s3_region = "garage"
api_bind_addr = "[::]:3900"
root_domain = ".s3.garage.localhost"

[s3_web]
bind_addr = "[::]:3902"
root_domain = ".web.garage.localhost"
index = "index.html"

[k2v_api]
api_bind_addr = "[::]:3904"

[admin]
api_bind_addr = "0.0.0.0:3903"
admin_token = "xxx"

``` metadata_dir = "/var/lib/garage/meta" data_dir = "/mnt/raid16t/garage/data" #db_engine = "lmdb" replication_mode = "none" rpc_bind_addr = "[::]:3901" rpc_public_addr = "xxx:3901" rpc_secret = "xxx" bootstrap_peers = [] [s3_api] s3_region = "garage" api_bind_addr = "[::]:3900" root_domain = ".s3.garage.localhost" [s3_web] bind_addr = "[::]:3902" root_domain = ".web.garage.localhost" index = "index.html" [k2v_api] api_bind_addr = "[::]:3904" [admin] api_bind_addr = "0.0.0.0:3903" admin_token = "xxx" ```

Mako commented

2023-08-26 10:37:44 +00:00

Garage version: v0.8.2 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs]
Rust compiler version: 1.63.0

Database engine: Sled

Table stats:
Table Items MklItems MklTodo GcTodo
bucket_v2 NC NC 0 0
key NC NC 0 0
object NC NC 0 0
version NC NC 0 0
block_ref NC NC 0 0

Block manager stats:
number of RC entries (~= number of blocks): NC
resync queue length: 50
blocks with resync errors: 48

If values are missing above (marked as NC), consider adding the --detailed flag (this will be slow).

Storage nodes:
ID Hostname Zone Capacity Part. DataAvail MetaAvail
ae06d710efxxxxx5 xxx xx 1 256 8.1 TB/15.9 TB (50.8%) 91.0 GB/1081.1 GB (8.4%)

Estimated available storage space cluster-wide (might be lower in practice):
data: 8.1 TB
metadata: 91.0 GB

Garage version: v0.8.2 [features: k2v, sled, lmdb, sqlite, consul-discovery, kubernetes-discovery, metrics, telemetry-otlp, bundled-libs] Rust compiler version: 1.63.0 Database engine: Sled Table stats: Table Items MklItems MklTodo GcTodo bucket_v2 NC NC 0 0 key NC NC 0 0 object NC NC 0 0 version NC NC 0 0 block_ref NC NC 0 0 Block manager stats: number of RC entries (~= number of blocks): NC resync queue length: 50 blocks with resync errors: 48 If values are missing above (marked as NC), consider adding the --detailed flag (this will be slow). Storage nodes: ID Hostname Zone Capacity Part. DataAvail MetaAvail ae06d710efxxxxx5 xxx xx 1 256 8.1 TB/15.9 TB (50.8%) 91.0 GB/1081.1 GB (8.4%) Estimated available storage space cluster-wide (might be lower in practice): data: 8.1 TB metadata: 91.0 GB

Mako commented

2023-08-26 11:45:59 +00:00

after --detailed


Table stats:
  Table      Items     MklItems  MklTodo  GcTodo
  bucket_v2  2         3         0        0
  key        1         1         0        0
  object     13015536  16162950  0        0
  version    12954414  16081812  0        0
  block_ref  16790024  21328767  0        0

after --detailed ``` Table stats: Table Items MklItems MklTodo GcTodo bucket_v2 2 3 0 0 key 1 1 0 0 object 13015536 16162950 0 0 version 12954414 16081812 0 0 block_ref 16790024 21328767 0 0 ```

lx commented

2023-10-16 09:45:36 +00:00

Owner

After looking at this again, it looks like the issues reported by @withinboredom and @Mako are not the same.

@withinboredom your garage block list-errors show many blocks in an errored state but with zero references; Garage shoud normally not try to store those blocks as they are not needed, and the errors should disappear. However the "next retry time" is in a lot of years so that retry will never happen and the errors will not be cleared. This looks a lot like an integer overflow/underflow error.
@Mako your garage block list-errors shows many blocks with non-zero reference counters. This might either be #644 (non-zero reference counter but blocks are not truly referenced) appearing on a scale unseen before (it usually happens for one block at a time), or an error where actually needed blocks did truly disappear. To know in which case we are, I need the output of garage block info for the block hashes in question.

After looking at this again, it looks like the issues reported by @withinboredom and @Mako are not the same. - @withinboredom your `garage block list-errors` show many blocks in an errored state but with zero references; Garage shoud normally not try to store those blocks as they are not needed, and the errors should disappear. However the "next retry time" is in a lot of years so that retry will never happen and the errors will not be cleared. This looks a lot like an integer overflow/underflow error. - @Mako your `garage block list-errors` shows many blocks with non-zero reference counters. This might either be #644 (non-zero reference counter but blocks are not truly referenced) appearing on a scale unseen before (it usually happens for one block at a time), or an error where actually needed blocks did truly disappear. To know in which case we are, I need the output of `garage block info` for the block hashes in question.

kot-o-pes commented

2023-10-20 23:14:31 +00:00

Hi, it seems like ive encountered same error but a little bit different considering i have two nodes and replication factor is set to 2

pi::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2 ,1 of 2 request succeeded, others returned errors: ["IO error: No file descriptors available (os error 24)"]

which is not correct as amount of consumed fds per node is ~2k and system max of fds is billions..

Hi, it seems like ive encountered same error but a little bit different considering i have two nodes and replication factor is set to 2 ``` pi::generic_server: Response: error 503 Service Unavailable, Internal error: Could not reach quorum of 2 ,1 of 2 request succeeded, others returned errors: ["IO error: No file descriptors available (os error 24)"] ``` which is not correct as amount of consumed fds per node is ~2k and system max of fds is billions..

lx commented

2023-10-23 09:52:18 +00:00

Owner

@kot-o-pes I don't think Garage is inventing this error, could you try to strace your process to see where it is comming from?

lx commented

2024-02-16 10:23:38 +00:00

Owner

I think this issue has failed to conclusively pinpoint a specific issue in Garage, so I will close it here for inactivity and lack of focus. For debugging running clusters, we are available to answer questions on the Matrix channel. If an actual issue with the handling of file descriptors can be demonstrated using appropriate tools such as strace, feel free to open a new issue.

lx closed this issue

2024-02-16 10:23:38 +00:00

Rows
Columns

Handle FD starvation correctly #595