Allow anonymous read access to buckets to enable website hosting #6

Closed
opened 2020-10-21 12:25:38 +00:00 by quentin · 6 comments
Owner

To enable website hosting, we should start by allowing anonymous access to the bucket.
Either by creating a specific/fake API key or by adding an option to buckets.
Will try to be more precise after reading the code and see how AWS is doing.

To enable website hosting, we should start by allowing anonymous access to the bucket. Either by creating a specific/fake API key or by adding an option to buckets. Will try to be more precise after reading the code and see how AWS is doing.
Owner

Suggested architecture: open a third end point (HTTP server) only for anonymous/public access to buckets configured to serve as static websites. This would allow to clearly distinguish the semantics of the S3 API (read/write/list files/etc, authentified) and of the public website access (read only, no auth).

Suggested architecture: open a third end point (HTTP server) only for anonymous/public access to buckets configured to serve as static websites. This would allow to clearly distinguish the semantics of the S3 API (read/write/list files/etc, authentified) and of the public website access (read only, no auth).
Author
Owner

AWS seems to use an independent endpoint for websites, similarly as your recommendation:
https://docs.aws.amazon.com/fr_fr/AmazonS3/latest/dev/WebsiteEndpoints.html

Next, I will see how website management is implemented in S3.

AWS seems to use an independent endpoint for websites, similarly as your recommendation: https://docs.aws.amazon.com/fr_fr/AmazonS3/latest/dev/WebsiteEndpoints.html Next, I will see how website management is implemented in S3.
Author
Owner

It might be possible to implement this as a S3 API (not necessary the option we want).
Here are the relevant endpoints:

Some other endpoints, not mandatory, but that could be of interest later:

If we choose this solution, creating a website would be as simple as running:

s3cmd ws-create s3://example-bucket

where example-bucket is the bucket.

We could then expose websites on a specific port as suggested by lx, specific port that would be bound to a specific domain in our reverse proxy. Not so simple however...

If we publish buckets as site.deuxfleurs.fr/example-bucket, we might be open to security risks if people use cookies and that sort of things that are bound to a domain name.

If we publish buckets as example-bucket.site.deuxfleurs.fr, we could use wildcards at the proxy and ACME/Let's Encrypt level. Would need to check it can be done in practise.

Finally, if we want to be as generic as possible and support arbitrary websites like example.com, we should use the Consul Catalog feature of Traefik: when PutBucketWebsite is called, an entry is inserted in Consul KV to inform Traefik of the website. Especially, two URLS must be registered: example.com and example.com.site.deuxfleurs.fr. Similarly, with example-bucket and example-bucket.site.deuxfleurs.fr must be registered. In the second case, we can see that Traefik will never be able to register for the domain name example-bucket. We must be careful with that and check that it will not exhaust Let's Encrypt Rate Limiting.
Otherwise, we could check before adding the domain name that its DNS entry is example.com CNAME site.deuxfleurs.fr.


Not related to this issue but some other endpoints we could implement:

  • ListBuckets and corresponding Get and Put commands if we want to enable a key to manage its own buckets
  • PutBucketAcl and corresponding Get and Delete commands if we want to enable a key to manage ACL on buckets

Not sure it is a good idea.

It might be possible to implement this as a S3 API (not necessary the option we want). Here are the relevant endpoints: - [DeleteBucketWebsite](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketWebsite.html) - [GetBucketWebsite](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketWebsite.html) - [PutBucketWebsite](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html) Some other endpoints, not mandatory, but that could be of interest later: - [PutBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketCors.html) - [GetBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketCors.html) - [DeleteBucketCors](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteBucketCors.html) If we choose this solution, creating a website would be as simple as running: ```bash s3cmd ws-create s3://example-bucket ``` where `example-bucket` is the bucket. We could then expose websites on a specific port as suggested by lx, specific port that would be bound to a specific domain in our reverse proxy. Not so simple however... If we publish buckets as `site.deuxfleurs.fr/example-bucket`, we might be open to security risks if people use cookies and that sort of things that are bound to a domain name. If we publish buckets as `example-bucket.site.deuxfleurs.fr`, we could use wildcards at the proxy and ACME/Let's Encrypt level. Would need to check it can be done in practise. Finally, if we want to be as generic as possible and support arbitrary websites like `example.com`, we should use the Consul Catalog feature of Traefik: when PutBucketWebsite is called, an entry is inserted in Consul KV to inform Traefik of the website. Especially, two URLS must be registered: `example.com` and `example.com.site.deuxfleurs.fr`. Similarly, with `example-bucket` and `example-bucket.site.deuxfleurs.fr` must be registered. In the second case, we can see that Traefik will never be able to register for the domain name `example-bucket`. We must be careful with that and check that it will not exhaust Let's Encrypt Rate Limiting. Otherwise, we could check before adding the domain name that its DNS entry is `example.com CNAME site.deuxfleurs.fr`. --- Not related to this issue but some other endpoints we could implement: - [ListBuckets](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListBuckets.html) and corresponding Get and Put commands if we want to enable a key to manage its own buckets - [PutBucketAcl](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketAcl.html) and corresponding Get and Delete commands if we want to enable a key to manage ACL on buckets Not sure it is a good idea.
Author
Owner

So my minimal proposition to start:

  • Implement S3 commands {Put,Get,Delete}BucketWebsite
  • Add a new listening port (7879?) that will handle website traffic
  • Add keys to configure Traefik in Consul (if configured to do so)
    • Possibly check bucket CNAME to prevent useless failures on let's encrypt API

What do you think of this plan LX? If you validate it, I will open a [WIP] PR to track my progress :-)

So my minimal proposition to start: - Implement S3 commands {Put,Get,Delete}BucketWebsite - Add a new listening port (7879?) that will handle website traffic - Add keys to configure Traefik in Consul (if configured to do so) - Possibly check bucket CNAME to prevent useless failures on let's encrypt API What do you think of this plan LX? If you validate it, I will open a [WIP] PR to track my progress :-)
Owner

In my opinion, a minimal proposition would not even contain the implementation of the {Put,Get,Delete}BucketWebsite endpoints, and only allow configuration using the command line interface. This is currently what we are doing to create buckets and configure access keys (we don't have PutBucket or such). This requires manual intervention for the configuration of every new website, however given the small numbers of website hosted on Deuxfleurs, this is probably an acceptable cost to begin with.

Exposing API endpoints that allowe the user to create or configure buckets should be a separate issue. We probably need more thought as to what permission model we want to implement before we do that.

So my plan for a minimal implementation would only be:

  • add a bit in the bucket table that says whether that bucket is exposed as a website
  • add a CLI command to set that bit
  • create an endpoint for public website access for buckets that have that bit enabled, using the Host: header to read from the correct bucket, if we decide that our convention is that the bucket name must match the domain name of the website

Alternatively, a 1-indirection-layer option exists: use a separate table to store website configuration, so that website host names do not need to match bucket names, and several website host names can be served by the same bucket.

In my opinion, a minimal proposition would not even contain the implementation of the {Put,Get,Delete}BucketWebsite endpoints, and only allow configuration using the command line interface. This is currently what we are doing to create buckets and configure access keys (we don't have PutBucket or such). This requires manual intervention for the configuration of every new website, however given the small numbers of website hosted on Deuxfleurs, this is probably an acceptable cost to begin with. Exposing API endpoints that allowe the user to create or configure buckets should be a separate issue. We probably need more thought as to what permission model we want to implement before we do that. So my plan for a minimal implementation would only be: * add a bit in the bucket table that says whether that bucket is exposed as a website * add a CLI command to set that bit * create an endpoint for public website access for buckets that have that bit enabled, using the Host: header to read from the correct bucket, if we decide that our convention is that the bucket name must match the domain name of the website Alternatively, a 1-indirection-layer option exists: use a separate table to store website configuration, so that website host names do not need to match bucket names, and several website host names can be served by the same bucket.
Author
Owner

Agree, let's start without implementing the S3 API. To be as close as possible as S3, I will start by not adding the 1-indirection-layer. We will see later if we need it but having the same website listening on multiple domain name is a bad practise in term of SEO (redirections to a main domain name is preferred)

Agree, let's start without implementing the S3 API. To be as close as possible as S3, I will start by **not** adding the 1-indirection-layer. We will see later if we need it but having the same website listening on multiple domain name is a bad practise in term of SEO (redirections to a main domain name is preferred)
lx closed this issue 2021-01-15 16:49:51 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Deuxfleurs/garage#6
No description provided.