Monitoring of cluster, nodes and PostgreSQL databases

In Managed Databases PostgreSQL you can monitor the state of the cluster.

To assess the overall health of the cluster , look at the status of the cluster.

For more detailed analysis, some metrics can be viewed as graphs in the dashboard:

node cluster metrics;
database metrics;
coupling pooler metrics.

The full set of available metrics can be exported in Prometheus format.

The time in the control panel corresponds to the time set on your device and is independent of the region where the cluster is hosted.

:::sweatFor example, you have created a cluster in Tashkent, in the uz-1 pool. On the device from which you accessed the control panel, the time zone of Moscow is set. The time on the metrics charts will be displayed in the Moscow time zone. :::

View cluster status

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.

In the cluster row, look at the status.

ACTIVE	The cluster is available
CREATING	A cluster is created
UPDATING	Changes are applied to the cluster
RESIZING	The cluster is scalable
ERROR	There was an error, create a ticket
DISK FULL	The disk is full and the cluster is read-only. To make the cluster read-write, clear the disk or scale the cluster and select a configuration with a smaller disk size
DEGRADED	Some nodes in the cluster are unavailable
DELETING	The cluster is being deleted

View the status of the node cluster

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Cluster Monitoring block, click Cluster Nodes.
Select the nodes whose metrics you want to view.
Look at the available metrics of the node cluster.

Cluster node metrics in the control panel

Memory	Memory utilization excluding cache and operating system buffers in percent or gigabytes
vCPU	How many percent of the node cluster cores are utilized
CPU iowait	How much percent of the processor's time was spent waiting for I/O
Disk	Used disk space in percent or gigabytes. It takes into account the part of disk space reserved for service needs and not available for hosting databases. For more information about reserving disk space, see Using disk space in a PostgreSQL cluster in this manual ...
Load Average	The average value of system load over a period of time. Shows how many processes are processed by the cluster cores. The indicator is presented in the form of three values - for one minute, five minutes and 15 minutes. These values should not be more than the number of cores on the node
OOM	Number of processes that ended with an `Out of Memory` error due to lack of RAM
Disk load	Data read and write speed in KB/s or number of read and write operations per second
Network load	The number of bits or packets sent and received over the network interface

View the status of the databases

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Cluster Monitoring block, click Databases.
Select the nodes whose metrics you want to view.
Take a look at the available database metrics.

Database metrics in the dashboard

Statistics file size	Total size of the statistics file in kilobytes
Cache hit	What percentage of the data in the query was read from the cache - the ratio of `blks_hit` to the sum of `blks_hit` and `blks_read`
String operations	The number of rows affected by operations in the selected database per second: `tup_deleted` - number of rows deleted by operations per second; `tup_fetched` - number of rows fetched by operations per second; `tup_inserted` - number of lines inserted by operations per second; `tup_returned` - number of lines returned by operations per second; `tup_updated` - number of rows updated by operations per second
Locks	Number of locks in each cluster database
Deadlocks	Number of mutual locks in each database
Transactions	Number of transactions per second in each cluster database
Connections	Number of connections to each cluster database and total number of connections to all databases
Temporary file size	Total size of temporary files in kilobytes
Size of WAL files	Total size of WAL files in megabytes
Execution time of the longest query	Execution time of the longest query in each database of the cluster for a period of time
Database size	Total size of the selected database in megabytes

View the status of the connection pooler

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Cluster Monitoring block, click Connection Pooler.
Select the nodes whose metrics you want to view.
Look at the available metrics of the node cluster.

Connection pooler metrics in the control panel

Maximum waiting time of the client in the queue	Maximum waiting time of the client in the queue in the selected database in seconds
Waiting time for a response from the server	Time to wait for a response from a node in the selected database in seconds
Active connections to the server	Number of server connections associated with clients in the selected database
Customer connections to the pool	The number of client connections to the pool in the selected database: `pools_client_active_connections` - The number of client connections associated with server connections or idle without requests; `pools_client_waiting_connections` - number of client connections where a request has been sent but no connection to the node has yet been established

Export metrics in Prometheus format

Historical information for clusters is not available - metrics are requested only in real time. The list of all metrics that are supported in Managed Databases and their description can be viewed in the Metrics table in Prometheus format.

Get a token.
Get the metrics in Prometheus format.

1. Get a token

The token gives access to the metrics of all project clusters in a single pool.

In the Dashboard, on the top menu, click Products and select Managed Databases.
Open the Active tab.
Open the cluster page → Monitoring tab.
In the Tokens for Prometheus block, click Create token. The token will be generated automatically.
Copy the token. To do this, click in the token line.

2. Get metrics in Prometheus format

Configuration file
CLI

Add to the Prometheus configuration file:
```
scrape_configs:
  - job_name: get-metrics-from-dbaas
    scrape_interval: 1m
    static_configs:
      - targets:
        - '<domain>'
    scheme: https
    authorization:
      type: Bearer
      credentials: <token>
```
Specify:
- <domain> - domain of the Managed Databases API. This is the part of the URL to access the API without https:// and /v1 for example ru-3.dbaas.selcloud.ru. URL depends on region and pool. you can look in URL list;
- <token> - the token you copied when receiving the token in step 5.
Open a page in your browser where Prometheus-formatted metrics will be available:
```
http://<ip_address>:9090/targets
```
Specify <ip_address> - the IP address where Prometheus is installed.
Independently configure monitoring and alerts for database clusters.

Open the CLI.
To get the metrics, submit a request:
```
curl -L "https://<domain>/metrics" -H "Authorization: Bearer <token>"
```
Specify:
- <domain> - domain of the Managed Databases API. This is the part of the URL to access the API without https:// and /v1 for example ru-3.dbaas.selcloud.ru. URL depends on region and pool. you can look in URL list;
- <token> - the token you copied when receiving the token in step 5.
Available metrics in Prometheus format will appear in the response.
Independently configure monitoring and alerts for database clusters.

Metrics in Prometheus format

Metrics in Prometheus format are provided for all clusters. A specific cluster can be found by the database cluster identifier in the ds_id label.

Infrastructure level metrics
Application level metrics

dbaas_memory_percent	Memory utilization excluding cache and operating system buffers (RAM) in percent
dbaas_memory_bytes	Occupied memory excluding cache and operating system buffers (RAM) in bytes
dbaas_oom_count	Number of processes that ended with an `Out of Memory` error due to lack of RAM
dbaas_cpu	Percent vCPU utilization on database cluster nodes
dbaas_cpu_iowait	I/O waiting time in percent
dbaas_disk_percent	Occupied disk space in percent. It takes into account the part of disk space reserved for service needs and unavailable for database placement. For more information about reserving disk space, see Using disk space in the PostgreSQL cluster in the instruction.
dbaas_disk_bytes	Occupied disk space in bytes. It takes into account the part of disk space reserved for service needs and not available for database placement. For more information about reserving disk space, see Using disk space in a PostgreSQL cluster in the instruction.
dbaas_disk_read_iops	Number of read operations per second
dbaas_disk_write_iops	Number of recording operations per second
dbaas_disk_read_bytes	Disk read speed in bytes per second
dbaas_disk_write_bytes	Data write speed to disk in bytes per second
dbaas_node_load1	The average value of system load in one minute. Shows how many processes are processed by the cluster cores
dbaas_node_load5	The average system utilization over five minutes. Shows how many processes are processed by the cluster cores
dbaas_node_load15	The average value of system load over 15 minutes. Shows how many processes are processed by the cluster cores
dbaas_network_receive_bytes	Number of bytes received through the network interface
dbaas_network_transmit_bytes	Number of bytes sent through the network interface
dbaas_network_receive_packets	Number of packets received through the network interface per second
dbaas_network_transmit_packets	Number of packets sent through the network interface per second
dbaas_role	Role of the node: `0` - role unknown; `1` - master; `2` - replica

dbaas_connections	The number of active connections to the process PostgreSQL. For example, you can use labels: `ds_name` - name of the database cluster; `datname` - database name.
dbaas_total_connections	Total number of established connections to the PostgreSQL process
dbaas_max_tx_duration	Execution time of the longest request in seconds
dbaas_xact_commit_rollback	The number of transactions per second in each database in the cluster. For example, you can use labels: `ds_name` - name of the database cluster; `datname` - database name.
dbaas_tup_deleted	Number of rows per second removed by queries in the database
dbaas_tup_fetched	Number of rows per second retrieved by queries in the database
dbaas_tup_inserted	Number of rows per second inserted by queries in the database
dbaas_tup_returned	Number of rows per second returned by queries in the database
dbaas_tup_updated	Number of rows per second changed by queries in the database
dbaas_xact_commit	Number of recorded transactions per second in the database
dbaas_xact_rollback	Number of transactions per second in the database for which rollback was performed
dbaas_cache_hit_ratio	Percentage of data in the query that was read from the cache - the ratio of `blks_hit` to the sum of `blks_hit` and `blks_read`
dbaas_deadlocks	The number of mutual locks per second in each database. For example, you can use labels: `ds_name` - name of the database cluster; `datname` - database name.
dbaas_locks	The number of locks per second in each cluster database. For example, you can use labels: `ds_name` - name of the database cluster; `datname` - database name.
dbaas_pg_pgss_query_texts_size_bytes	Size of the file with statistics from `pg_stat_statements` in bytes
dbaas_pg_total_wals_size_bytes	Directory size with WAL files in bytes
dbaas_pg_tmp_size_bytes	Size of temporary PostgreSQL files in bytes
dbaas_databases_size_bytes	The total size of the database in bytes. For example, you can use the label `datname` - database name
dbaas_pg_trx_max_age	Number of transactions executed after the last freeze with VACUUM FREEZE or AUTOVACUUM operation
dbaas_pg_trx_percent_before_vacuum_freeze	Indicates the age of the earliest transaction in the database is close to the threshold, after which PostgreSQL forcibly starts VACUUM FREEZE operation. The threshold is defined by `autovacuum\_freeze\_max\_age` parameter
dbaas_pg_pg_trx_percent_before_wraparound_risk	Indicates how close the age of the earliest transaction in the database is to the threshold after which overflow of transaction identifiers is possible (wraparound)
dbaas_pgbouncer_pools_client_maxwait_seconds	Maximum waiting time of the client in the queue in seconds
dbaas_pgbouncer_pools_client_waiting_connections	Number of client connections where a request has been sent but no connection to the node has yet been established
dbaas_pgbouncer_stats_client_wait_seconds_total	Time to wait for a response from a node in microseconds
dbaas_pgbouncer_pools_client_active_connections	The number of client connections associated with server connections or idle without requests. For example, you can use labels: `ds_id` - The identifier of the database cluster; `data_base` - database name.
dbaas_pgbouncer_pools_server_active_connections	Number of server connections associated with clients
dbaas_pg_replication_slot_active	Replication slot status: `0` - the slot is not used. The slot has no consumer, the data is not transferred to the receiving database, accumulates in the replication slot and occupies additional disk space; `1` - slot is in use. The slot has a consumer, the data is transmitted to the receiving database
dbaas_pg_replication_slot_lag	The size of accumulated WAL files in megabytes. Indicates how much transactional information the receiving database needs to process to catch up with the source database