Multi-char delimiters, was: CocroachDB backup failes with 400 bad request #692

Closed
opened 2024-01-19 14:24:20 +00:00 by dpape · 5 comments

Hello,

I have switched from ceph to garageDB because I like the architecture and simplicity of administration. Thank you for writing this piece of software.

I have now connected a CockroachDB cluster and try to setup backups. This gives me an error and I see the following in the logs:

GET /platform.db.backup/cockroach.bak/2024/01/19-141131.96/BACKUP_MANIFEST
INFO garage_api::generic_server: Response: error 404 Not Found, Key not found
INFO garage_api::generic_server: 10.133.58.244 (via 10.133.250.43:58142) GET /platform.db.backup/cockroach.bak/2024/01/19-141131.96/BACKUP-LOCK-935864244736524289
INFO garage_api::generic_server: Response: error 404 Not Found, Key not found
INFO garage_api::generic_server: 10.133.58.244 (via 10.133.250.43:58152) GET /platform.db.backup/cockroach.bak/2024/01/19-141131.96/BACKUP_MANIFEST
INFO garage_api::generic_server: Response: error 404 Not Found, Key not found
INFO garage_api::generic_server: 10.133.58.244 (via 10.133.250.43:58168) GET /platform.db.backup?delimiter=data%2F&prefix=cockroach.bak%2F2024%2F01%2F19-141131.96
INFO garage_api::generic_server: Response: error 400 Bad Request, Bad request: Failed to parse query parameter

This returns the following error in the cockroachDB console:

ERROR: checking for BACKUP-LOCK file: failed to list s3 bucket: InvalidRequest: Bad request: Failed to parse query parameter

I'm not very familiar with the S3 API iteself and hope to find some help here.

Kind regards,
Daan

Hello, I have switched from ceph to garageDB because I like the architecture and simplicity of administration. Thank you for writing this piece of software. I have now connected a CockroachDB cluster and try to setup backups. This gives me an error and I see the following in the logs: ``` GET /platform.db.backup/cockroach.bak/2024/01/19-141131.96/BACKUP_MANIFEST INFO garage_api::generic_server: Response: error 404 Not Found, Key not found INFO garage_api::generic_server: 10.133.58.244 (via 10.133.250.43:58142) GET /platform.db.backup/cockroach.bak/2024/01/19-141131.96/BACKUP-LOCK-935864244736524289 INFO garage_api::generic_server: Response: error 404 Not Found, Key not found INFO garage_api::generic_server: 10.133.58.244 (via 10.133.250.43:58152) GET /platform.db.backup/cockroach.bak/2024/01/19-141131.96/BACKUP_MANIFEST INFO garage_api::generic_server: Response: error 404 Not Found, Key not found INFO garage_api::generic_server: 10.133.58.244 (via 10.133.250.43:58168) GET /platform.db.backup?delimiter=data%2F&prefix=cockroach.bak%2F2024%2F01%2F19-141131.96 INFO garage_api::generic_server: Response: error 400 Bad Request, Bad request: Failed to parse query parameter ``` This returns the following error in the cockroachDB console: ``` ERROR: checking for BACKUP-LOCK file: failed to list s3 bucket: InvalidRequest: Bad request: Failed to parse query parameter ``` I'm not very familiar with the S3 API iteself and hope to find some help here. Kind regards, Daan
Owner

It looks like CockroachDB is making irregular usage of the S3 API : it is sending a multi-character value for the delimiter parameter of the ListObjects API call. According to the S3 documentation, the delimiter is supposed to be a single character, and Garage is implemented to support only single-character delimiters. However since CockroachDB sends multi-character delimiters, it must mean that they are supported by AWS S3, so this is arguably a compatibility bug in Garage.

In order, we need to:

  1. Confirm that multi-character delimiters work on AWS S3, and determine their exact behavior

  2. Implement the same thing in Garage

It looks like CockroachDB is making irregular usage of the S3 API : it is sending a multi-character value for the `delimiter` parameter of the ListObjects API call. According to the S3 documentation, the delimiter is supposed to be a single character, and Garage is implemented to support only single-character delimiters. However since CockroachDB sends multi-character delimiters, it must mean that they are supported by AWS S3, so this is arguably a compatibility bug in Garage. In order, we need to: 1. Confirm that multi-character delimiters work on AWS S3, and determine their exact behavior 2. Implement the same thing in Garage
lx added the
S3 Compatibility
label 2024-01-22 10:39:44 +00:00
Author

Hi @lx,

Thank you so much for your very fast reply and insights. I have searched on this multi-character delimiter and it is described in this cockroachDB issue: https://github.com/cockroachdb/cockroach/issues/100123

It seems that the S3 documentation is not clear (https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html):

  • Description in text: 'A delimiter is a character that you use to group keys.'
  • Description in XML: <Delimiter>string</Delimiter>

I don't know the exact behavior or also how I can test it. But it seems that the above issue and cockroachDB validated that AWS S3 is actually accepting strings a delimiter.

If I can assist in any way, please let me know.

Hi @lx, Thank you so much for your very fast reply and insights. I have searched on this multi-character delimiter and it is described in this cockroachDB issue: https://github.com/cockroachdb/cockroach/issues/100123 It seems that the S3 documentation is not clear (https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html): - Description in text: 'A delimiter is a character that you use to group keys.' - Description in XML: `<Delimiter>string</Delimiter>` I don't know the exact behavior or also how I can test it. But it seems that the above issue and cockroachDB validated that AWS S3 is actually accepting strings a delimiter. If I can assist in any way, please let me know.
Owner

I was wondering if there might be a "gotcha" in S3, for example: multi-character delimiters being accepted but only the last character actually being taken into account. To make sure that the full delimiter string is actually used, we would ideally need to create a test bucket on AWS and insert some objects with various names and paths, and then check the result of ListObjects with multi-char delimiters. For instance, if we have two objects called a/b/c/d and a/c/b/e, listing the entire bucket with delimiter / would give us just a/ as a result in CommonPrefixes, but with a delimiter of b/ we would have two results in CommonPrefixes: a/b/ and a/c/b/. If you'd like to make these tests against AWS S3 and/or Minio, that would be of great help. You can use the s3api subcommand of awscli v2 to directly run requests against an S3 server from the command line (documentation)

Anyways, I'm tagging this for v1.0 since once we get the behavior confirmed, this shouldn't be too hard to implement (I think we're already supporting multi-byte delimiters for unicode characters, so supporting arbitrary-length strings won't be a problem)

I was wondering if there might be a "gotcha" in S3, for example: multi-character delimiters being accepted but only the last character actually being taken into account. To make sure that the full delimiter string is actually used, we would ideally need to create a test bucket on AWS and insert some objects with various names and paths, and then check the result of ListObjects with multi-char delimiters. For instance, if we have two objects called `a/b/c/d` and `a/c/b/e`, listing the entire bucket with delimiter `/` would give us just `a/` as a result in CommonPrefixes, but with a delimiter of `b/` we would have two results in CommonPrefixes: `a/b/` and `a/c/b/`. If you'd like to make these tests against AWS S3 and/or Minio, that would be of great help. You can use the `s3api` subcommand of `awscli` v2 to directly run requests against an S3 server from the command line ([documentation](https://docs.aws.amazon.com/cli/latest/reference/s3api/#cli-aws-s3api)) Anyways, I'm tagging this for v1.0 since once we get the behavior confirmed, this shouldn't be too hard to implement (I think we're already supporting multi-byte delimiters for unicode characters, so supporting arbitrary-length strings won't be a problem)
lx added this to the v1.0 milestone 2024-01-22 11:15:23 +00:00
Author

Hi @lx,

I have tried to make the requests to AWS. I first created the object structure:

aws s3api list-objects-v2 --bucket 'b76dd182-36b1-4522-b323-2fa6ef63ad33'
{
    "Contents": [
        {
            "Key": "a/",
            "LastModified": "2024-01-22T14:01:45+00:00",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "a/b/",
            "LastModified": "2024-01-22T14:01:52+00:00",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "a/b/c/",
            "LastModified": "2024-01-22T14:02:02+00:00",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "a/b/c/d",
            "LastModified": "2024-01-22T14:02:45+00:00",
            "ETag": "\"da2951b491bda922a2a0da759548ee32\"",
            "Size": 56388,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "a/c/",
            "LastModified": "2024-01-22T14:02:58+00:00",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "a/c/b/",
            "LastModified": "2024-01-22T14:03:03+00:00",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "a/c/b/e",
            "LastModified": "2024-01-22T14:03:14+00:00",
            "ETag": "\"29a6bf90b2b18f46968503f7d9a4b597\"",
            "Size": 25750,
            "StorageClass": "STANDARD"
        }
    ],
    "RequestCharged": null
}

And then I have done your requests:

aws s3api list-objects-v2 --bucket 'b76dd182-36b1-4522-b323-2fa6ef63ad33' --delimiter '/'
{
    "CommonPrefixes": [
        {
            "Prefix": "a/"
        }
    ],
    "RequestCharged": null
}
aws s3api list-objects-v2 --bucket 'b76dd182-36b1-4522-b323-2fa6ef63ad33' --delimiter 'b/'
{
    "Contents": [
        {
            "Key": "a/",
            "LastModified": "2024-01-22T14:01:45+00:00",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "a/c/",
            "LastModified": "2024-01-22T14:02:58+00:00",
            "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
            "Size": 0,
            "StorageClass": "STANDARD"
        }
    ],
    "CommonPrefixes": [
        {
            "Prefix": "a/b/"
        },
        {
            "Prefix": "a/c/b/"
        }
    ],
    "RequestCharged": null
}

It thus seems that your understanding of how it should work is correct. I hope this helps.

Kind regards,
Daan

Hi @lx, I have tried to make the requests to AWS. I first created the object structure: ``` aws s3api list-objects-v2 --bucket 'b76dd182-36b1-4522-b323-2fa6ef63ad33' { "Contents": [ { "Key": "a/", "LastModified": "2024-01-22T14:01:45+00:00", "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "Size": 0, "StorageClass": "STANDARD" }, { "Key": "a/b/", "LastModified": "2024-01-22T14:01:52+00:00", "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "Size": 0, "StorageClass": "STANDARD" }, { "Key": "a/b/c/", "LastModified": "2024-01-22T14:02:02+00:00", "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "Size": 0, "StorageClass": "STANDARD" }, { "Key": "a/b/c/d", "LastModified": "2024-01-22T14:02:45+00:00", "ETag": "\"da2951b491bda922a2a0da759548ee32\"", "Size": 56388, "StorageClass": "STANDARD" }, { "Key": "a/c/", "LastModified": "2024-01-22T14:02:58+00:00", "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "Size": 0, "StorageClass": "STANDARD" }, { "Key": "a/c/b/", "LastModified": "2024-01-22T14:03:03+00:00", "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "Size": 0, "StorageClass": "STANDARD" }, { "Key": "a/c/b/e", "LastModified": "2024-01-22T14:03:14+00:00", "ETag": "\"29a6bf90b2b18f46968503f7d9a4b597\"", "Size": 25750, "StorageClass": "STANDARD" } ], "RequestCharged": null } ``` And then I have done your requests: ``` aws s3api list-objects-v2 --bucket 'b76dd182-36b1-4522-b323-2fa6ef63ad33' --delimiter '/' { "CommonPrefixes": [ { "Prefix": "a/" } ], "RequestCharged": null } ``` ``` aws s3api list-objects-v2 --bucket 'b76dd182-36b1-4522-b323-2fa6ef63ad33' --delimiter 'b/' { "Contents": [ { "Key": "a/", "LastModified": "2024-01-22T14:01:45+00:00", "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "Size": 0, "StorageClass": "STANDARD" }, { "Key": "a/c/", "LastModified": "2024-01-22T14:02:58+00:00", "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"", "Size": 0, "StorageClass": "STANDARD" } ], "CommonPrefixes": [ { "Prefix": "a/b/" }, { "Prefix": "a/c/b/" } ], "RequestCharged": null } ``` It thus seems that your understanding of how it should work is correct. I hope this helps. Kind regards, Daan
Owner

Thank you for testing, I'll keep you updated on the implementation.

Thank you for testing, I'll keep you updated on the implementation.
lx changed title from CocroachDB backup failes with 400 bad request to Multi-char delimiters, was: CocroachDB backup failes with 400 bad request 2024-01-24 08:21:05 +00:00
lx closed this issue 2024-02-09 13:38:18 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#692
No description provided.