ADR-006: Metrics And Logs
Status
Done
Context
We want to collect business and infrastructure metrics from both observer and int3face node binaries.
Implementation
- Implement log aggregation.
- Add metrics for the number of incoming and outgoing transactions.
- Add metrics for total vault supply.
- Add metrics for total Cosmos-represented asset supply.
- Calculate and monitor the ratio between vault and Cosmos asset supplies.
- Monitor incoming and outgoing transaction volume.
- Monitor node status, including health status and current height.
- Implement TSS metrics.
Use this template as a reference.
Steps to implement
- Determine how to scrape metrics from nodes: 71.
- Set up metrics storage: 72.
- Research methods to expose and modify Cosmos-based metrics: 73.
- Add business and infrastructure metrics for the Int3face node: 74.
- Add business and infrastructure metrics for the Observer node: 75.
- Deploy the metrics infrastructure to Hetzner: 76.
- Set up a metrics dashboard using the reference template: 77.
Solution
- Implemented monitoring based on Prometheus and Grafana.
- Prometheus and Grafana are deployed by docker-compose.
- Also running as a demon:
- node-exporter to collect node metrics.
- cosmos-exporter to collect Cosmos metrics.
- Monitoring repository: int3face-monitoring contains:
- Prometheus & Grafana configs
- Deployment scripts
- Docker-compose file
- Dashboards
- Grafana dashboard (Hetzner): int3face-monitoring
- Login:
admin
- Password: {please ask from a team member}
- Login:
- Prometheus (Hetzner): prometheus-ui
Architecture:
Metrics overview
Observer Node Metrics
Configuration block in observer.toml file:
[monitoring]
# The flag to enable the metrics server.
enabled = true
# The port to expose the metrics on.
port = 27727
Metrics list:
Metric name | Metric Type | Labels | Description |
---|---|---|---|
observer_transfer_success_total | Counter | [from_chain, to_chain] | Number of successful transfers |
observer_transfer_failed_total | Counter | [from_chain, to_chain] | Number of failed transfers |
observer_transfer_duration_seconds | Histogram | [from_chain, to_chain] | Time spent on processing transfer |
observer_transfers_queue_size | Gauge | [chain_id] | Size of transfers queue |
observer_transferred_amount_total | Counter | [from_chain, to_chain] | Amount of transferred assets |
observer_tss_sign_success_total | Counter | [] | Number of successful TSS signs |
observer_tss_sign_failed_total | Counter | [] | Number of failed TSS signs |
observer_tss_sign_duration_seconds | Histogram | [] | Time spent on TSS |
observer_keygen_processing_success_total | Counter | [] | Number of successful key generation processes |
observer_keygen_processing_failed_total | Counter | [] | Number of failed key generation processes |
observer_keygen_processing_duration_seconds | Histogram | [] | Time spent on key generation |
observer_vault_migration_duration_seconds | Histogram | [] | Time spent on vault migration |
observer_chain_client_health | Gauge | [chain_id] | Chain client health status |
observer_chain_height | Gauge | [chain_id] | Chain height |
observer_chain_last_observed_height | Gauge | [chain_id] | Last observed chain height |
observer_total_supply | Gauge | [chain_id, asset_id, vault_address] | Total supply of assets |
Int3face Node Metrics
Configuration block in config/config.toml file:
#######################################################
### Instrumentation Configuration Options ###
#######################################################
[instrumentation]
# When true, Prometheus metrics are served under /metrics on
# PrometheusListenAddr.
# Check out the documentation for the list of available metrics.
prometheus = true
# Address to listen for Prometheus collector(s) connections
prometheus_listen_addr = ":26660"
# Maximum number of simultaneous connections.
# If you want to accept a larger number than the default, make sure
# you increase your OS limits.
# 0 - unlimited.
max_open_connections = 3
# Instrumentation namespace
namespace = "cometbft"
- Consensus metrics: link to metrics
- Cosmos-exporter metrics (all the metrics provided by cosmos-exporter have the following prefixes):
- cosmosvalidator* - metrics related to a single validator
- cosmosvalidators* - metrics related to a validator set
- cosmoswallet* - metrics related to a single wallet
Node Exporter Metrics
- Node exporter metrics: link to metrics