add doc comments #53
No reviewers
Labels
No labels
action
check-aws
action
discussion-needed
action
for-external-contributors
action
for-newcomers
action
more-info-needed
action
need-funding
action
triage-required
kind
correctness
kind
ideas
kind
improvement
kind
performance
kind
testing
kind
usability
kind
wrong-behavior
prio
critical
prio
low
scope
admin-api
scope
background-healing
scope
build
scope
documentation
scope
k8s
scope
layout
scope
metadata
scope
ops
scope
rpc
scope
s3-api
scope
security
scope
telemetry
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: Deuxfleurs/garage#53
Loading…
Reference in a new issue
No description provided.
Delete branch "trinity-1686a/garage:doc-comments"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Add doc comments to the code
Feel free to correct if some doc-comments are inexact or missing because I did not understand what was happening
Edit:
I was not able to comment everything as I did not understand every parts of the code
Some remarks:
garage_rpc::Ring::new
for instance is too long to understand for me, enabling some linters in CI to check for such complex code would be nice (example: cargo clippy with cognitive_complexity on). In general having linters for code quality in CI helps a lot on long terme maintainability.garage_table::schema::TableSchema
ada2027ae1
to88705a009e
WIP: add doc commentsto add doc commentsThank you trinity for the large amount of work on this PR. I have made a few comments at places where it seemed that there was misunderstandings of the code.
I'll say that globally the practice of adding a doc comment on everything seems abusive to me, as many such comments are strictly equivalent in information content (if not inferior) as the function name + type signature, and they have the inconvenient of reducing the information density of the code which I don't like (it's like a useless distraction taking up space on my screen). I prefer it when comments are reserved to cases where they bring significant insight to the code (which is the case of some comments in this PR). However if everyone thinks it's better to doc-comment everything I have no strong opposition to it being done. Let's just make sure these comments accurately reflect the high-level thinking behind the code and don't bring in any misconceptions and misunderstandings.
@ -23,3 +29,4 @@
NotFound,
// Category: bad request
/// The request used an invalid path
the request contained an invalid UTF-8 sequence in its path or in other parameters
would be more accurate@ -52,6 +65,7 @@ impl From<roxmltree::Error> for Error {
}
impl Error {
/// Convert an error into an Http status code
rather
get the HTTP status code that best represents the meaning of the error for the client
@ -3,2 +5,3 @@
pub mod error;
mod error;
pub use error::Error;
Doesn't this prevent us from using
OkOrBadRequest
andOkOrInternalError
? (haven't read what's below yet)It prevents use from other crates, but not from this one. It does not break anything, but does prevent using those traits in web in the future, for instance ????
Alright, we'll make things
pub
again if/when we need them.@ -1,4 +1,6 @@
#![deny(missing_crate_level_docs, missing_docs)]
#![recursion_limit = "1024"]
//! Garage CLI, used to interact with a running Garage instance, and to launch a Garage
"and to launch a Garage instance" ?
@ -142,3 +157,4 @@
}
/// Write a block to disk
pub async fn write_block(&self, hash: &Hash, data: &[u8]) -> Result<Message, Error> {
should probably not be
pub
, same forread_block
andneed_block
@ -230,2 +250,4 @@
}
/// Decrement the number of time a block is used
// when counter reach 0, it seems not put to resync which I assume put it to gc?
resync does both: if the rc is >0 and the block is not present, ask it to peers. on the contrary if rc=0 and block is present, delete it locally (after checking that no peer needs it)
I understand resync both get missing blocks and remove unnecessary blocks. My problem is that if I understand the code correctly, when the counter go down from 1 to 0, the value for new_rc is Some(0) (actually a Vec of 8 0u8), which is not None, so the block is not put to resync, and don't get deleted, unless some background thread regularly scan the table, which sound inneficient
block_decref
actually deletes the value in the rc when it reaches 0, so it is indeed None and not Some(0), unless I'm mistaken? Maybe you're not looking at the last version of the code?Nevermind, I somehow read
old_v >= 1
instead ofold_v > 1
@ -14,3 +15,4 @@
pub block: Hash,
// Sort key
// why a version on a hashed (probably immutable) piece of data?
version is the id of a Version object in the version table (which stores the different versions of objects in the buckets), not a version number for the BlockRef. i.e.
BlockRef.version
is the version id of an object that contains this block@ -17,3 +19,4 @@
pub version: UUID,
// Keep track of deleted status
/// Is that block deleted
-> is the Version (a version of an object in a bucket) that contains this block deleted
@ -60,15 +67,21 @@ impl BucketParams {
}
impl Bucket {
/// Create a new bucket
This formulation is ambiguous/misleading, as the bucket is not really created until the
Bucket
is commited to the databaseSuggestion:
Initializes a new instance of the Bucket struct
@ -67,2 +75,4 @@
}
}
/// Query if bucket is deleted
This formulation is ambiguous/misleading, as
is_deleted
does not do a database query, it just checks if this struct that was returned earlier from the db has the deleted flagSuggestion:
returns true if this represents a deleted bucket
@ -102,1 +115,4 @@
fn updated(&self, _old: Option<Self::E>, _new: Option<Self::E>) {
// nothing to do when updated
}
did you remove the default empty implementation in the trait? was it a bad practice to have it?
I would not necessarily call it bad practice.
It conveyed to me that "doing nothing" is a good default behavior, and while doing nothing might be a valid behavior, I think wanting to do nothing should be rather explicit.
(I answererd below)
@ -20,3 +21,2 @@
// Authorized keys
/// Buckets in which the key is authorized. Empty if `Key` is deleted
pub authorized_buckets: crdt::LWWMap<String, PermissionSet>,
// CRDT interaction: deleted implies authorized_buckets is empty
this comment needs to be kept somewhere
@ -6,3 +7,3 @@
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct Key {
// Primary key
/// The id of the key (immutable)
please keep in the comment that this acts as the primary key
@ -17,3 +18,3 @@
pub bucket: String,
// Sort key
/// The key at which the object is stored in its bucket
please keep in the comment that these two are used respectively as partition key (not primary key, my bad) and sort key, it's an import detail in understanding how Garage works internally
@ -20,3 +21,3 @@
pub key: String,
// Data
/// The list of known versions of the object
doesn't keep all known versions, the comment is misleading
I'm not sure what it store then
@ -50,2 +53,4 @@
}
}
/// Get a list of all versions known for `Object`
s/known/currently stored
@ -67,2 +77,4 @@
Uploading(ObjectVersionHeaders),
/// The version is fully received
Complete(ObjectVersionData),
/// The version was never fully received
it can never be received and stay in Uploading state forever if CompleteMultipartUpload/AbortMultipartUpload is never called (and it doesn't matter)
@ -93,2 +105,4 @@
/// Data about an object version
#[derive(PartialEq, Eq, PartialOrd, Ord, Clone, Debug, Serialize, Deserialize)]
pub enum ObjectVersionData {
/// The version is deleted
the version represents a deletion of the object, i.e. at this version number the object doesn't exist (404)
@ -131,2 +159,4 @@
}
}
/// Is the object version available (received and not deleted)
is the object version available as an existing object (received and not a delete marker)
@ -13,3 +14,3 @@
#[derive(PartialEq, Clone, Debug, Serialize, Deserialize)]
pub struct Version {
// Primary key
/// UUID of the version
used as a partition key
@ -43,7 +49,9 @@ impl Version {
#[derive(PartialEq, Eq, Clone, Copy, Debug, Serialize, Deserialize)]
pub struct VersionBlockKey {
/// Number of the part, starting at 1
not necessarily starting at 1, the client decides the part numbers and we don't care (for instance he can send parts 0, 1, 2, 14, 22, 37 and 42, and that's a fine object as long as these parts match the checklist that is sent in the CompleteMultipartUpload call)
Ok I've read s3 doc on multi-part upload and I hate it.
Next comment appear to be false too, it's offset in part, not in the whole file
Yes, we don't know the offset in the whole file when a part is being uploaded because parts get uploaded independently from one another.
@ -1,3 +1,4 @@
/// Module containing structs related to membership management
//!
@ -78,28 +95,39 @@ pub struct System {
rpc_client: Arc<RpcClient<Message>>,
pub(crate) status: watch::Receiver<Arc<Status>>,
/// The ring, viewed by this node
not "viewed by this node", all nodes should have equal rings (if not it's a bug)
@ -88,2 +108,4 @@
#[derive(Debug, Clone)]
pub struct Status {
/// Mapping of each node id to its known status
// considering its sorted regularly, maybe it should be a btreeset?
when is it sorted?
it's sorted in recalculate_hash. Actually the node list is short, so sorting it is pretty much free
That's my reasoning as well.
@ -90,1 +110,4 @@
/// Mapping of each node id to its known status
// considering its sorted regularly, maybe it should be a btreeset?
pub nodes: HashMap<UUID, Arc<StatusEntry>>,
/// Hash of this entry
hash of the whole set of currently known nodes. This hash is sent in regular ping messages so nodes can detect when they have different views of the cluster. They then exchange their peer lists kind of in an anti-entropy process.
@ -96,2 +121,4 @@
pub addr: SocketAddr,
/// Last time this node was seen
pub last_seen: u64,
/// Number of consecutive pings sent without reply to this node
Not sure but this counts failed requests of any kind, and not just ping messages. It is incremented in rpc_client.rs
hum, it might double-count pings, as it count once in rpc_clients, and once in this file in ping_nodes
Thanks for noticing that, I'll have to check.
Ok so the cound in
rpc_client
does not happen when sending a ping message inping_nodes
inmembership.rs
because we are callingRpcAddrClient::call
and notRpcClient::call
(i.e. we are calling the node using its IP address and not its node identifier). The increment inrpc_client
is done inRpcClient::call
because it needs to know the node ID. So there is no redundancy between the two.@ -341,3 +373,3 @@
addr,
sys.rpc_client
.by_addr()
.get_addr()
shouldn't be renamed to
get_addr
(see below)@ -1,3 +1,4 @@
//! Module containing types related to computing nodes which should receive a copy of data blocks
and metadata entries
@ -23,2 +27,4 @@
/// The maximum number of time an object might get replicated
pub const MAX_REPLICATION: usize = 3;
/// The versionned configurations of all nodes known in the network
Not really, a node can be known in that it answers ping messages, and not be configured if it is idle or draining pending removal by admins.
would "available" be correct instead of known?
Just put a TODO in the comment and I'll write it later
Suggestion:
The user-defined configuration of the cluster's nodes
@ -180,3 +197,4 @@
top >> (16 - PARTITION_BITS)
}
/// Get the list of partitions and
and the first hash (of a partition key) that would fall in this partition
@ -193,6 +211,7 @@ impl Ring {
ret
}
/// Walk the ring to find the n servers in which data should be replicated
TODO rename this function as it doesn't walk the ring anymore but just returns the nodes at the corresponding location. This is vocabulary from the previous implementation that used datacenter-aware ring walking and led to data imbalance.
@ -201,12 +220,15 @@ impl Ring {
let top = u16::from_be_bytes(from.as_slice()[0..2].try_into().unwrap());
let partition_idx = (top >> (16 - PARTITION_BITS)) as usize;
// TODO why computing two time in the same way and asserting?
just to make sure I don't mess up. Should probably be factorized in a single location.
@ -45,3 +52,4 @@
self.rs_timeout = timeout;
self
}
/// Set if requests can be dropped after quorum has been reached
as a general rule: true for read requests, false for write requests
@ -92,2 +105,3 @@
pub fn by_addr(&self) -> &RpcAddrClient<M> {
/// Get the server address this client connect to
pub fn get_addr(&self) -> &RpcAddrClient<M> {
Please don't call this
get_addr
, it's not what it does ! It does not return a SocketAddr or something like that. It is used to get the underlying RpcAddrClient which allows for RPC by specifing the target node's SocketAddr instead of their node IDs. Also, the comment is wrong.@ -150,2 +167,4 @@
}
/// Make a RPC call to multiple servers, returning either a Vec of responses, or an error if
/// strategy could not be respected due to to many errors
too many*
@ -208,6 +227,7 @@ impl<M: RpcMessage + 'static> RpcClient<M> {
}
}
/// Endpoint to which send RPC
No. This is an RPC client which contains only the necessary logic to call RPC's to nodes by specifying their SocketAddr, and not by specifying their node IDs. It is used as an underlying layer for RpcClient.
@ -14,2 +16,4 @@
fn write_nodes(&self, hash: &Hash) -> Vec<UUID>;
/// Responses needed to consider a write succesfull
fn write_quorum(&self) -> usize;
// this feels like its write_nodes().len() - write_quorum()
Most of the time it is, but not always for fully replicated tables
@ -4,7 +4,9 @@ use garage_util::data::*;
use crate::crdt::CRDT;
/// Trait for partitionnable data
It's not a trait for the data, but only for a field of the data that is used to partition it.
@ -20,7 +22,9 @@ impl PartitionKey for Hash {
}
}
/// Trait for sortable data
Same as above, it's only a trait of one of the fields of the data
@ -64,3 +77,3 @@
// due to CRDT logic. Typically errors in propagation of info should be logged
// to stderr.
fn updated(&self, _old: Option<Self::E>, _new: Option<Self::E>) {}
fn updated(&self, old: Option<Self::E>, new: Option<Self::E>);
Why remove empty implementation?
As explained above, I feel like a default implementation should be what we want in most case. Of 5 implementations of this trait, only 2 actually use an empty
updated
, so being explicit about it seems to me like a better option.Idk, to me the reasonning was that
updated()
is implemented when we need to add logic to do something when an entry changes, but if we don't need to add anything we shouldn't have to write an emptyupdated()
handler. I don't see how having an empty default implementation can cause much confusion as it does literally nothing.@ -16,3 +20,3 @@
NotFound,
// Category: bad request
/// The client requested a malformed path
see garage_api/error.rs
4d83c5c0ff
toc8906f200b
Make sure you check my answers on the previous review as well.
@ -387,2 +418,4 @@
} else if let Some(id) = id_option {
if let Some(st) = status.nodes.get_mut(id) {
// TODO this might double-increment the value as the counter is already
// incremented for any kind of failure in rpc_client
Actually no because the increment in
rpc_client.rs
is done by theRpcClient
and not theRpcAddrClient
, but here we are using theRpcAddrClient
. (we could put this in a comment to make it clearer)