Fix XML encoding #62

Closed
opened 2021-04-27 21:42:22 +00:00 by lx · 4 comments
Owner

Decoding of XML in S3 requests is done using a proper library, roxmltree.

Encoding however is not, it is currently produced in an ad-hoc manner which is prone to many bug. We should fix this.

  • survey the rust ecosystem and find a xml writing library that's simple to use and not bloated
  • replace all xml generation codes in api/ with proper code using that

This issue has the "Bug" tag because it probably causes issues, like #59 for instance

Decoding of XML in S3 requests is done using a proper library, `roxmltree`. Encoding however is not, it is currently produced in an ad-hoc manner which is prone to many bug. We should fix this. - [ ] survey the rust ecosystem and find a xml writing library that's simple to use and not bloated - [ ] replace all xml generation codes in api/ with proper code using that This issue has the "Bug" tag because it probably causes issues, like #59 for instance
lx added the
Newcomer
Bug
S3 Compatibility
labels 2021-04-27 21:42:49 +00:00
Owner

Could we use serde_xml_rs?

  • We already use it elsewhere in the project
  • We can serialize and deserialize content
  • It maps data in/from a Rust datastructure
Could we use [serde_xml_rs](https://docs.rs/serde-xml-rs/0.4.1/serde_xml_rs/)? - We already use it elsewhere in the project - We can serialize and deserialize content - It maps data in/from a Rust datastructure
Author
Owner

@quentin yes serde_xml_rs might be an option, however I'm not sure it easily supports things such as adding a <?xml version="1.0" encoding="UTF-8"?> header at the beginning, or custom xmlns attributes. Also the variables in Rust structs are typically named in snake_case but XML tag names in S3's format usually use CamelCase. So if we use Serde that probably means crating custom structs for encoding, adding a whole bunch of attributes so that it does what we want (which might be badly documented or just plain impossible), and then translating our data into these structs before serializing. My intuition is that we can probably have something much more straightforward with a programmatic interface, i.e. a simple XML builder object with methods such as: add header, open tag, add attribute, add text value, etc.

Here are two examples of crates in this spirit:

There is also this crates that includes both serialization and deserialization in a similar spirit, and could maybe used to do both and thus replace our curent dependency on roxmltree for parsing:

Of course serde_xml_rs is still an option if we can make it work.

If it were me I'd probably go with just xmlwriter because we can trivially replace the current XML serialization with that as it works the same, just ensuring that the produced XML has the right structure and is encoded correctly.

@quentin yes serde_xml_rs might be an option, however I'm not sure it easily supports things such as adding a `<?xml version="1.0" encoding="UTF-8"?>` header at the beginning, or custom `xmlns` attributes. Also the variables in Rust structs are typically named in `snake_case` but XML tag names in S3's format usually use `CamelCase`. So if we use Serde that probably means crating custom structs for encoding, adding a whole bunch of attributes so that it does what we want (which might be badly documented or just plain impossible), and then translating our data into these structs before serializing. My intuition is that we can probably have something much more straightforward with a programmatic interface, i.e. a simple XML builder object with methods such as: add header, open tag, add attribute, add text value, etc. Here are two examples of crates in this spirit: - <https://docs.rs/xmlwriter/0.1.0/xmlwriter/> - <https://docs.rs/simple_xml_serialize/0.3.0/simple_xml_serialize/> There is also this crates that includes both serialization and deserialization in a similar spirit, and could maybe used to do both and thus replace our curent dependency on `roxmltree` for parsing: - <https://docs.rs/sxd-document/0.3.2/sxd_document/> Of course serde_xml_rs is still an option if we can make it work. If it were me I'd probably go with just `xmlwriter` because we can trivially replace the current XML serialization with that as it works the same, just ensuring that the produced XML has the right structure and is encoded correctly.
Owner

I tried serde_xml_rs and it seems to miss many many features plus it has been reported being slow.

Currently, I have a proof of concept running with serde + quick_xml here: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/feature/s3/list-buckets/src/api/s3_bucket.rs#L77-L111

It addresses the issues you mentionned.

The advantage I see by using serde is that, even if we are a bit more verbose, we write code that is declarative and not imperative which make it easier to read it in my opinion.

I tried `serde_xml_rs` and it seems to miss many many features plus it has been reported being slow. Currently, I have a proof of concept running with serde + quick_xml here: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/branch/feature/s3/list-buckets/src/api/s3_bucket.rs#L77-L111 It addresses the issues you mentionned. The advantage I see by using serde is that, even if we are a bit more verbose, we write code that is declarative and not imperative which make it easier to read it in my opinion.
Author
Owner

Fixed by #74

Fixed by #74
lx closed this issue 2021-05-06 20:48:05 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#62
No description provided.