forked from Deuxfleurs/garage
Reorder reference manual section, move metrics list to there
This commit is contained in:
parent
56384677fa
commit
44f8b1d71a
5 changed files with 289 additions and 278 deletions
|
@ -52,280 +52,6 @@ or make your own.
|
|||
We detail below the list of exposed metrics and their meaning.
|
||||
|
||||
|
||||
|
||||
## List of exported metrics
|
||||
|
||||
### Garage system metrics
|
||||
|
||||
#### `garage_build_info` (counter)
|
||||
|
||||
Exposes the Garage version number running on a node.
|
||||
|
||||
```
|
||||
garage_build_info{version="1.0"} 1
|
||||
```
|
||||
|
||||
#### `garage_replication_factor` (counter)
|
||||
|
||||
Exposes the Garage replication factor configured on the node
|
||||
|
||||
```
|
||||
garage_replication_factor 3
|
||||
```
|
||||
|
||||
### Metrics of the API endpoints
|
||||
|
||||
#### `api_admin_request_counter` (counter)
|
||||
|
||||
Counts the number of requests to a given endpoint of the administration API. Example:
|
||||
|
||||
```
|
||||
api_admin_request_counter{api_endpoint="Metrics"} 127041
|
||||
```
|
||||
|
||||
#### `api_admin_request_duration` (histogram)
|
||||
|
||||
Evaluates the duration of API calls to the various administration API endpoint. Example:
|
||||
|
||||
```
|
||||
api_admin_request_duration_bucket{api_endpoint="Metrics",le="0.5"} 127041
|
||||
api_admin_request_duration_sum{api_endpoint="Metrics"} 605.250344830999
|
||||
api_admin_request_duration_count{api_endpoint="Metrics"} 127041
|
||||
```
|
||||
|
||||
#### `api_s3_request_counter` (counter)
|
||||
|
||||
Counts the number of requests to a given endpoint of the S3 API. Example:
|
||||
|
||||
```
|
||||
api_s3_request_counter{api_endpoint="CreateMultipartUpload"} 1
|
||||
```
|
||||
|
||||
#### `api_s3_error_counter` (counter)
|
||||
|
||||
Counts the number of requests to a given endpoint of the S3 API that returned an error. Example:
|
||||
|
||||
```
|
||||
api_s3_error_counter{api_endpoint="GetObject",status_code="404"} 39
|
||||
```
|
||||
|
||||
#### `api_s3_request_duration` (histogram)
|
||||
|
||||
Evaluates the duration of API calls to the various S3 API endpoints. Example:
|
||||
|
||||
```
|
||||
api_s3_request_duration_bucket{api_endpoint="CreateMultipartUpload",le="0.5"} 1
|
||||
api_s3_request_duration_sum{api_endpoint="CreateMultipartUpload"} 0.046340762
|
||||
api_s3_request_duration_count{api_endpoint="CreateMultipartUpload"} 1
|
||||
```
|
||||
|
||||
#### `api_k2v_request_counter` (counter), `api_k2v_error_counter` (counter), `api_k2v_error_duration` (histogram)
|
||||
|
||||
Same as for S3, for the K2V API.
|
||||
|
||||
|
||||
### Metrics of the Web endpoint
|
||||
|
||||
|
||||
#### `web_request_counter` (counter)
|
||||
|
||||
Number of requests to the web endpoint
|
||||
|
||||
```
|
||||
web_request_counter{method="GET"} 80
|
||||
```
|
||||
|
||||
#### `web_request_duration` (histogram)
|
||||
|
||||
Duration of requests to the web endpoint
|
||||
|
||||
```
|
||||
web_request_duration_bucket{method="GET",le="0.5"} 80
|
||||
web_request_duration_sum{method="GET"} 1.0528433229999998
|
||||
web_request_duration_count{method="GET"} 80
|
||||
```
|
||||
|
||||
#### `web_error_counter` (counter)
|
||||
|
||||
Number of requests to the web endpoint resulting in errors
|
||||
|
||||
```
|
||||
web_error_counter{method="GET",status_code="404 Not Found"} 64
|
||||
```
|
||||
|
||||
|
||||
### Metrics of the data block manager
|
||||
|
||||
#### `block_bytes_read`, `block_bytes_written` (counter)
|
||||
|
||||
Number of bytes read/written to/from disk in the data storage directory.
|
||||
|
||||
```
|
||||
block_bytes_read 120586322022
|
||||
block_bytes_written 3386618077
|
||||
```
|
||||
|
||||
#### `block_compression_level` (counter)
|
||||
|
||||
Exposes the block compression level configured for the Garage node.
|
||||
|
||||
```
|
||||
block_compression_level 3
|
||||
```
|
||||
|
||||
#### `block_read_duration`, `block_write_duration` (histograms)
|
||||
|
||||
Evaluates the duration of the reading/writing of individual data blocks in the data storage directory.
|
||||
|
||||
```
|
||||
block_read_duration_bucket{le="0.5"} 169229
|
||||
block_read_duration_sum 2761.6902550310056
|
||||
block_read_duration_count 169240
|
||||
block_write_duration_bucket{le="0.5"} 3559
|
||||
block_write_duration_sum 195.59170078500006
|
||||
block_write_duration_count 3571
|
||||
```
|
||||
|
||||
#### `block_delete_counter` (counter)
|
||||
|
||||
Counts the number of data blocks that have been deleted from storage.
|
||||
|
||||
```
|
||||
block_delete_counter 122
|
||||
```
|
||||
|
||||
#### `block_resync_counter` (counter), `block_resync_duration` (histogram)
|
||||
|
||||
Counts the number of resync operations the node has executed, and evaluates their duration.
|
||||
|
||||
```
|
||||
block_resync_counter 308897
|
||||
block_resync_duration_bucket{le="0.5"} 308892
|
||||
block_resync_duration_sum 139.64204196100016
|
||||
block_resync_duration_count 308897
|
||||
```
|
||||
|
||||
#### `block_resync_queue_length` (gauge)
|
||||
|
||||
The number of block hashes currently queued for a resync.
|
||||
This is normal to be nonzero for long periods of time.
|
||||
|
||||
```
|
||||
block_resync_queue_length 0
|
||||
```
|
||||
|
||||
#### `block_resync_errored_blocks` (gauge)
|
||||
|
||||
The number of block hashes that we were unable to resync last time we tried.
|
||||
**THIS SHOULD BE ZERO, OR FALL BACK TO ZERO RAPIDLY, IN A HEALTHY CLUSTER.**
|
||||
Persistent nonzero values indicate that some data is likely to be lost.
|
||||
|
||||
```
|
||||
block_resync_errored_blocks 0
|
||||
```
|
||||
|
||||
|
||||
### Metrics related to RPCs (remote procedure calls) between nodes
|
||||
|
||||
#### `rpc_netapp_request_counter` (counter)
|
||||
|
||||
Number of RPC requests emitted
|
||||
|
||||
```
|
||||
rpc_request_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 176
|
||||
```
|
||||
|
||||
#### `rpc_netapp_error_counter` (counter)
|
||||
|
||||
Number of communication errors (errors in the Netapp library, generally due to disconnected nodes)
|
||||
|
||||
```
|
||||
rpc_netapp_error_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 354
|
||||
```
|
||||
|
||||
#### `rpc_timeout_counter` (counter)
|
||||
|
||||
Number of RPC timeouts, should be close to zero in a healthy cluster.
|
||||
|
||||
```
|
||||
rpc_timeout_counter{from="<this node>",rpc_endpoint="garage_rpc/membership.rs/SystemRpc",to="<remote node>"} 1
|
||||
```
|
||||
|
||||
#### `rpc_duration` (histogram)
|
||||
|
||||
The duration of internal RPC calls between Garage nodes.
|
||||
|
||||
```
|
||||
rpc_duration_bucket{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>",le="0.5"} 166
|
||||
rpc_duration_sum{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 35.172253716
|
||||
rpc_duration_count{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 174
|
||||
```
|
||||
|
||||
|
||||
### Metrics of the metadata table manager
|
||||
|
||||
#### `table_gc_todo_queue_length` (gauge)
|
||||
|
||||
Table garbage collector TODO queue length
|
||||
|
||||
```
|
||||
table_gc_todo_queue_length{table_name="block_ref"} 0
|
||||
```
|
||||
|
||||
#### `table_get_request_counter` (counter), `table_get_request_duration` (histogram)
|
||||
|
||||
Number of get/get_range requests internally made on each table, and their duration.
|
||||
|
||||
```
|
||||
table_get_request_counter{table_name="bucket_alias"} 315
|
||||
table_get_request_duration_bucket{table_name="bucket_alias",le="0.5"} 315
|
||||
table_get_request_duration_sum{table_name="bucket_alias"} 0.048509778000000024
|
||||
table_get_request_duration_count{table_name="bucket_alias"} 315
|
||||
```
|
||||
|
||||
|
||||
#### `table_put_request_counter` (counter), `table_put_request_duration` (histogram)
|
||||
|
||||
Number of insert/insert_many requests internally made on this table, and their duration
|
||||
|
||||
```
|
||||
table_put_request_counter{table_name="block_ref"} 677
|
||||
table_put_request_duration_bucket{table_name="block_ref",le="0.5"} 677
|
||||
table_put_request_duration_sum{table_name="block_ref"} 61.617528636
|
||||
table_put_request_duration_count{table_name="block_ref"} 677
|
||||
```
|
||||
|
||||
#### `table_internal_delete_counter` (counter)
|
||||
|
||||
Number of value deletions in the tree (due to GC or repartitioning)
|
||||
|
||||
```
|
||||
table_internal_delete_counter{table_name="block_ref"} 2296
|
||||
```
|
||||
|
||||
#### `table_internal_update_counter` (counter)
|
||||
|
||||
Number of value updates where the value actually changes (includes creation of new key and update of existing key)
|
||||
|
||||
```
|
||||
table_internal_update_counter{table_name="block_ref"} 5996
|
||||
```
|
||||
|
||||
#### `table_merkle_updater_todo_queue_length` (gauge)
|
||||
|
||||
Merkle tree updater TODO queue length (should fall to zero rapidly)
|
||||
|
||||
```
|
||||
table_merkle_updater_todo_queue_length{table_name="block_ref"} 0
|
||||
```
|
||||
|
||||
#### `table_sync_items_received`, `table_sync_items_sent` (counters)
|
||||
|
||||
Number of data items sent to/recieved from other nodes during resync procedures
|
||||
|
||||
```
|
||||
table_sync_items_received{from="<remote node>",table_name="bucket_v2"} 3
|
||||
table_sync_items_sent{table_name="block_ref",to="<remote node>"} 2
|
||||
```
|
||||
|
||||
|
||||
See our [dedicated page](@/documentation/reference-manual/monitoring.md) in the Reference manual section.
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
+++
|
||||
title = "Administration API"
|
||||
weight = 60
|
||||
weight = 40
|
||||
+++
|
||||
|
||||
The Garage administration API is accessible through a dedicated server whose
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
+++
|
||||
title = "K2V"
|
||||
weight = 70
|
||||
weight = 100
|
||||
+++
|
||||
|
||||
Starting with version 0.7.2, Garage introduces an optionnal feature, K2V,
|
||||
|
|
285
doc/book/reference-manual/monitoring.md
Normal file
285
doc/book/reference-manual/monitoring.md
Normal file
|
@ -0,0 +1,285 @@
|
|||
|
||||
+++
|
||||
title = "Monitoring"
|
||||
weight = 60
|
||||
+++
|
||||
|
||||
|
||||
For information on setting up monitoring, see our [dedicated page](@/documentation/cookbook/monitoring.md) in the Cookbook section.
|
||||
|
||||
## List of exported metrics
|
||||
|
||||
### Garage system metrics
|
||||
|
||||
#### `garage_build_info` (counter)
|
||||
|
||||
Exposes the Garage version number running on a node.
|
||||
|
||||
```
|
||||
garage_build_info{version="1.0"} 1
|
||||
```
|
||||
|
||||
#### `garage_replication_factor` (counter)
|
||||
|
||||
Exposes the Garage replication factor configured on the node
|
||||
|
||||
```
|
||||
garage_replication_factor 3
|
||||
```
|
||||
|
||||
### Metrics of the API endpoints
|
||||
|
||||
#### `api_admin_request_counter` (counter)
|
||||
|
||||
Counts the number of requests to a given endpoint of the administration API. Example:
|
||||
|
||||
```
|
||||
api_admin_request_counter{api_endpoint="Metrics"} 127041
|
||||
```
|
||||
|
||||
#### `api_admin_request_duration` (histogram)
|
||||
|
||||
Evaluates the duration of API calls to the various administration API endpoint. Example:
|
||||
|
||||
```
|
||||
api_admin_request_duration_bucket{api_endpoint="Metrics",le="0.5"} 127041
|
||||
api_admin_request_duration_sum{api_endpoint="Metrics"} 605.250344830999
|
||||
api_admin_request_duration_count{api_endpoint="Metrics"} 127041
|
||||
```
|
||||
|
||||
#### `api_s3_request_counter` (counter)
|
||||
|
||||
Counts the number of requests to a given endpoint of the S3 API. Example:
|
||||
|
||||
```
|
||||
api_s3_request_counter{api_endpoint="CreateMultipartUpload"} 1
|
||||
```
|
||||
|
||||
#### `api_s3_error_counter` (counter)
|
||||
|
||||
Counts the number of requests to a given endpoint of the S3 API that returned an error. Example:
|
||||
|
||||
```
|
||||
api_s3_error_counter{api_endpoint="GetObject",status_code="404"} 39
|
||||
```
|
||||
|
||||
#### `api_s3_request_duration` (histogram)
|
||||
|
||||
Evaluates the duration of API calls to the various S3 API endpoints. Example:
|
||||
|
||||
```
|
||||
api_s3_request_duration_bucket{api_endpoint="CreateMultipartUpload",le="0.5"} 1
|
||||
api_s3_request_duration_sum{api_endpoint="CreateMultipartUpload"} 0.046340762
|
||||
api_s3_request_duration_count{api_endpoint="CreateMultipartUpload"} 1
|
||||
```
|
||||
|
||||
#### `api_k2v_request_counter` (counter), `api_k2v_error_counter` (counter), `api_k2v_error_duration` (histogram)
|
||||
|
||||
Same as for S3, for the K2V API.
|
||||
|
||||
|
||||
### Metrics of the Web endpoint
|
||||
|
||||
|
||||
#### `web_request_counter` (counter)
|
||||
|
||||
Number of requests to the web endpoint
|
||||
|
||||
```
|
||||
web_request_counter{method="GET"} 80
|
||||
```
|
||||
|
||||
#### `web_request_duration` (histogram)
|
||||
|
||||
Duration of requests to the web endpoint
|
||||
|
||||
```
|
||||
web_request_duration_bucket{method="GET",le="0.5"} 80
|
||||
web_request_duration_sum{method="GET"} 1.0528433229999998
|
||||
web_request_duration_count{method="GET"} 80
|
||||
```
|
||||
|
||||
#### `web_error_counter` (counter)
|
||||
|
||||
Number of requests to the web endpoint resulting in errors
|
||||
|
||||
```
|
||||
web_error_counter{method="GET",status_code="404 Not Found"} 64
|
||||
```
|
||||
|
||||
|
||||
### Metrics of the data block manager
|
||||
|
||||
#### `block_bytes_read`, `block_bytes_written` (counter)
|
||||
|
||||
Number of bytes read/written to/from disk in the data storage directory.
|
||||
|
||||
```
|
||||
block_bytes_read 120586322022
|
||||
block_bytes_written 3386618077
|
||||
```
|
||||
|
||||
#### `block_compression_level` (counter)
|
||||
|
||||
Exposes the block compression level configured for the Garage node.
|
||||
|
||||
```
|
||||
block_compression_level 3
|
||||
```
|
||||
|
||||
#### `block_read_duration`, `block_write_duration` (histograms)
|
||||
|
||||
Evaluates the duration of the reading/writing of individual data blocks in the data storage directory.
|
||||
|
||||
```
|
||||
block_read_duration_bucket{le="0.5"} 169229
|
||||
block_read_duration_sum 2761.6902550310056
|
||||
block_read_duration_count 169240
|
||||
block_write_duration_bucket{le="0.5"} 3559
|
||||
block_write_duration_sum 195.59170078500006
|
||||
block_write_duration_count 3571
|
||||
```
|
||||
|
||||
#### `block_delete_counter` (counter)
|
||||
|
||||
Counts the number of data blocks that have been deleted from storage.
|
||||
|
||||
```
|
||||
block_delete_counter 122
|
||||
```
|
||||
|
||||
#### `block_resync_counter` (counter), `block_resync_duration` (histogram)
|
||||
|
||||
Counts the number of resync operations the node has executed, and evaluates their duration.
|
||||
|
||||
```
|
||||
block_resync_counter 308897
|
||||
block_resync_duration_bucket{le="0.5"} 308892
|
||||
block_resync_duration_sum 139.64204196100016
|
||||
block_resync_duration_count 308897
|
||||
```
|
||||
|
||||
#### `block_resync_queue_length` (gauge)
|
||||
|
||||
The number of block hashes currently queued for a resync.
|
||||
This is normal to be nonzero for long periods of time.
|
||||
|
||||
```
|
||||
block_resync_queue_length 0
|
||||
```
|
||||
|
||||
#### `block_resync_errored_blocks` (gauge)
|
||||
|
||||
The number of block hashes that we were unable to resync last time we tried.
|
||||
**THIS SHOULD BE ZERO, OR FALL BACK TO ZERO RAPIDLY, IN A HEALTHY CLUSTER.**
|
||||
Persistent nonzero values indicate that some data is likely to be lost.
|
||||
|
||||
```
|
||||
block_resync_errored_blocks 0
|
||||
```
|
||||
|
||||
|
||||
### Metrics related to RPCs (remote procedure calls) between nodes
|
||||
|
||||
#### `rpc_netapp_request_counter` (counter)
|
||||
|
||||
Number of RPC requests emitted
|
||||
|
||||
```
|
||||
rpc_request_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 176
|
||||
```
|
||||
|
||||
#### `rpc_netapp_error_counter` (counter)
|
||||
|
||||
Number of communication errors (errors in the Netapp library, generally due to disconnected nodes)
|
||||
|
||||
```
|
||||
rpc_netapp_error_counter{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 354
|
||||
```
|
||||
|
||||
#### `rpc_timeout_counter` (counter)
|
||||
|
||||
Number of RPC timeouts, should be close to zero in a healthy cluster.
|
||||
|
||||
```
|
||||
rpc_timeout_counter{from="<this node>",rpc_endpoint="garage_rpc/membership.rs/SystemRpc",to="<remote node>"} 1
|
||||
```
|
||||
|
||||
#### `rpc_duration` (histogram)
|
||||
|
||||
The duration of internal RPC calls between Garage nodes.
|
||||
|
||||
```
|
||||
rpc_duration_bucket{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>",le="0.5"} 166
|
||||
rpc_duration_sum{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 35.172253716
|
||||
rpc_duration_count{from="<this node>",rpc_endpoint="garage_block/manager.rs/Rpc",to="<remote node>"} 174
|
||||
```
|
||||
|
||||
|
||||
### Metrics of the metadata table manager
|
||||
|
||||
#### `table_gc_todo_queue_length` (gauge)
|
||||
|
||||
Table garbage collector TODO queue length
|
||||
|
||||
```
|
||||
table_gc_todo_queue_length{table_name="block_ref"} 0
|
||||
```
|
||||
|
||||
#### `table_get_request_counter` (counter), `table_get_request_duration` (histogram)
|
||||
|
||||
Number of get/get_range requests internally made on each table, and their duration.
|
||||
|
||||
```
|
||||
table_get_request_counter{table_name="bucket_alias"} 315
|
||||
table_get_request_duration_bucket{table_name="bucket_alias",le="0.5"} 315
|
||||
table_get_request_duration_sum{table_name="bucket_alias"} 0.048509778000000024
|
||||
table_get_request_duration_count{table_name="bucket_alias"} 315
|
||||
```
|
||||
|
||||
|
||||
#### `table_put_request_counter` (counter), `table_put_request_duration` (histogram)
|
||||
|
||||
Number of insert/insert_many requests internally made on this table, and their duration
|
||||
|
||||
```
|
||||
table_put_request_counter{table_name="block_ref"} 677
|
||||
table_put_request_duration_bucket{table_name="block_ref",le="0.5"} 677
|
||||
table_put_request_duration_sum{table_name="block_ref"} 61.617528636
|
||||
table_put_request_duration_count{table_name="block_ref"} 677
|
||||
```
|
||||
|
||||
#### `table_internal_delete_counter` (counter)
|
||||
|
||||
Number of value deletions in the tree (due to GC or repartitioning)
|
||||
|
||||
```
|
||||
table_internal_delete_counter{table_name="block_ref"} 2296
|
||||
```
|
||||
|
||||
#### `table_internal_update_counter` (counter)
|
||||
|
||||
Number of value updates where the value actually changes (includes creation of new key and update of existing key)
|
||||
|
||||
```
|
||||
table_internal_update_counter{table_name="block_ref"} 5996
|
||||
```
|
||||
|
||||
#### `table_merkle_updater_todo_queue_length` (gauge)
|
||||
|
||||
Merkle tree updater TODO queue length (should fall to zero rapidly)
|
||||
|
||||
```
|
||||
table_merkle_updater_todo_queue_length{table_name="block_ref"} 0
|
||||
```
|
||||
|
||||
#### `table_sync_items_received`, `table_sync_items_sent` (counters)
|
||||
|
||||
Number of data items sent to/recieved from other nodes during resync procedures
|
||||
|
||||
```
|
||||
table_sync_items_received{from="<remote node>",table_name="bucket_v2"} 3
|
||||
table_sync_items_sent{table_name="block_ref",to="<remote node>"} 2
|
||||
```
|
||||
|
||||
|
|
@ -1,6 +1,6 @@
|
|||
+++
|
||||
title = "S3 Compatibility status"
|
||||
weight = 40
|
||||
weight = 70
|
||||
+++
|
||||
|
||||
## DISCLAIMER
|
||||
|
|
Loading…
Add table
Reference in a new issue