New model for buckets #172
No reviewers
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#172
Loading…
Reference in a new issue
No description provided.
Delete branch "new-buckets"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This should allow to implement CreateBucket and DeleteBucket, as well as garage-specific features such as aliasing and renaming bucket, as well as local aliases where keys have their own namespace of buckets they see.
bucket alias
andbucket unalias
For next PRs:
allow_create_bucket
flagallow_owner
)f2edb31ee1
toc3e8309ee5
c3e8309ee5
to66e1e78d19
66e1e78d19
to63a515a9f9
69886ef1a7
toca54ff40c6
3e75fc9b6b
toea5cddd30b
WIP: New model for bucketsto New model for bucketsfirst round of review (up to model sub-crate, I did not get the time to go further).
I think
try_migrate
on tables should take a hint about the version. At the moment we only have a single one, in the future we might have multiple, and could easily parse the wrong one.@quentin could you run your benchmark scripts on this vs main. I wonder how/if the extra indection impact first byte latency
@ -106,2 +109,4 @@
let endpoint = Endpoint::from_request(&req, bucket.map(ToOwned::to_owned))?;
let bucket_name = match endpoint.authorization_type() {
there is a
.get_bucket()
onEndpoint
which can be used to do the same thing more concisely@ -0,0 +8,4 @@
pub struct BucketHelper<'a>(pub(crate) &'a Garage);
#[allow(clippy::ptr_arg)]
I think this link could be one line lower to only cover the "offending" function
@ -0,0 +18,4 @@
.ok()
.map(|by| Uuid::try_from(&by))
.flatten();
if let Some(bucket_id) = hexbucket {
can't this pose problem if someone create a bucket which name is hex of the right size?
I added a check of the validity of bucket names. AWS bucket names are max 63 characters long, so no risk of it being a hex uuid.
could you add a comment saying that?
@ -0,0 +37,4 @@
}
}
#[allow(clippy::ptr_arg)]
I don't think this lint is required
@ -50,0 +32,4 @@
/// (under all of its global alias names)
pub website_access: crdt::Lww<bool>,
/// The website configuration XML document
pub website_config: crdt::Lww<Option<ByteBuf>>,
I'd argue only
website_config
should exist, with aSome(_)
generated automatically when migrating from 0.5 with website enabled.I also think this should probably contain a
WebsiteConfiguration
(or some flattened form of it), to not require parsing XML on each web request, however doing so does have downsides if we add things to this struct in the future.Concerning the first point (removing
website_access
and storing only the config as an option), I think yes that's probably better because it's closer to how Garage works currently. I was hesitant because in AWS the permissions and the website config seem to be handled separately, but Garage has its own logic and doesn't implement AWS's ACLs, so yes we can simplify here for now and come back to it later when/if we work on ACLs.Concerning storing a WebsiteConfiguration instead of a ByteBuf, there are upsides and downsides.
Upsides:
Downsides:
Alternatives:
web/
module for a nominal duratio of a few seconds to avoid parsing at every requesttable/
module to keep a cache of deserialized versions of stuff stored in the tableIn other words, storing a struct here 1/ has disadvantages in terms of keeping a clean architectures, and 2/ looks to me a bit like a case of premature optimization (we can have a separate reflexion later concerning how we approach performance on Garage tables, which is a whole topic on its own with multiple aspects such as caching, minimizing representation sizes, splitting stuff in separate tables, etc).
@ -0,0 +33,4 @@
}
if !other.allow_write {
self.allow_write = false;
}
allow_owner is not merged. Is it sementically read-only hence don't needs to (if so, please document it), or just forgotten?
Just forgotten, thx
@ -31,0 +33,4 @@
T: Eq,
{
fn merge(&mut self, other: &Self) {
if self != other {
I find this implementation surprising. I'd expect
Some(true)
andSome(false)
to be merged toSome(true)
, notNone
.So I haven't written a doc comment for this yet, but the idea of this impl is that we can make a CRDT of any type T that doesn't implement CRDT, by declaring that different values merge to None. We probably should use this more often in fact, instead of trying to merge things when we know we don't want to merge them (which is what the AutoCrdt trait is used for most of the time). This cases arises very often, for example with a
Lww
or aLwwMap
: the value type has to be a CRDT so that we have a rule for what to do when timestamps aren't enough to disambiguate (in a distributed system, anything can happen!), and with AutoCrdt the rule is to make an arbitrary (but determinstic) choice between the two. When using an option instead with this impl, ambiguity cases are explicitely stored as None, which allows us to detect the ambiguity and handle it in the way we want. This truly depends on the semantics of the application: thecrdt
module is just a toolbox of things that can be taken when needed when building models. In the precise case where we are using it here, i.e. for storing the website configuration, the logic is that if we have two website configurations and we don't know which one is the correct one, we can just disable website access until an admin gives a new configuration (we try to have a safe behavior instead of an unpredictable one -- but anyways this is an extreme edge case as it mostly should never happen that the timestamps are the same).I understand now, but I'm not sure it's well fit to some place it's used.
For website configuration, if two updates happen at the same time, I think it makes more sense to consider an arbitrary one happened an instant before the other, and got overwritten, than to merge both into the state "website disabled"
@ -6,1 +9,3 @@
use crate::key_table::PermissionSet;
use crate::permission::BucketKeyPerm;
pub const DEFAULT_WEBSITE_CONFIGURATION: &[u8] = b""; // TODO (an XML WebsiteConfiguration document per the AWS spec)
this is the equivalent to garage default config in 0.5.0
I'm still not convinced this should be XML, not because of performances reasons, but because I don't think we should store raw S3 payloads in general (and I generaly don't like XML)
@ -38,0 +50,4 @@
.authorized_buckets
.items()
.iter()
.filter(|(_, perms)| perms.allow_read || perms.allow_write)
would it make sense to be bucket owner without read nor write permission?
@ -38,0 +67,4 @@
if let Some(alias_ent) = alias_ent {
if let Some(alias_p) = alias_ent.state.get().as_option() {
if alias_p.bucket_id == *bucket_id {
aliases.insert(alias_ent.name().to_string(), *bucket_id);
this is a lot of indentation. It may be possible to remove a few layers by using Option::and_then, and doing
param.aliases.items().filter(|(_,_,active)| active)
cf5fe9080f
todc1971e427
f74ce3f0b6
tofdbf5bc235
301b7b928a
to70c84d8d44
a431e18736
tob5b2e47754
b5b2e47754
toc359c18403
39c4edb63a
to3bbf54d0f4
3bbf54d0f4
tob60c0bf6c6
b60c0bf6c6
tob65c3d87cf
af314c40f5
todf35feba18