Skip to main content

Monitoring a Kafka cluster

Last update:

In Kafka Managed Databases, you can track the status of the cluster.

To evaluate the overall status of the cluster, view its status.

For a more detailed analysis, you can:

The time in the control panel corresponds to the time set on your device and does not depend on the region where the cluster is hosted.

note

For example, you have created a cluster in Tashkent, in the uz-1 pool. On the device from which you logged into the control panel, the Moscow time zone is set. The time on the metrics charts will be displayed in the Moscow time zone.

View cluster status

  1. In the control panel, on the top menu click Products and select Managed Databases.

  2. Open the Active tab.

  3. View the status in the cluster row.

    ACTIVECluster is available
    CREATINGCluster is being created
    UPDATINGChanges are being applied to the cluster
    RESIZINGCluster is scaling
    ERRORAn error occurred, create a ticket
    DISK FULL

    The disk is full, and the cluster is in read-only mode. For the cluster to work in read and write mode, free up disk space or scale the cluster and choose a configuration with a larger disk size.

    DEGRADEDSome cluster nodes are unavailable
    DELETINGCluster is being deleted

View cluster node status

  1. In the control panel, on the top menu click Products and select Managed Databases.
  2. Open the Active tab.
  3. Open the cluster page → Monitoring tab.
  4. In the Cluster monitoring block, view the available cluster node metrics.

Cluster node metrics in the control panel

MemoryMemory used excluding operating system cache and buffers, in percent or gigabytes
vCPUPercentage of cluster node core utilization
CPU iowaitPercentage of time the processor spent waiting for I/O operations
Disk

Occupied disk space in percent or gigabytes. This accounts for the portion of disk space reserved for service needs and unavailable for database placement. For more information about reserving disk space, see the instructions Using disk space in a Kafka cluster

Load Average

average system load over a period of time. It shows how many processes are being handled by cluster cores. The indicator is presented as three values — for one minute, five minutes, and 15 minutes. These values should not exceed the number of cores on the node

OOM

number of processes that finished with an Out of Memory error due to lack of RAM

Disk load

Read and write speed in KB/s or number of read/write operations per second

Network loadNumber of bits or packets sent and received via the network interface

Export metrics in Prometheus format

historical information for clusters is not available — metrics are requested only in real time. The list of all metrics supported in Manage Databases and their descriptions can be viewed in the Metrics in Prometheus format table.

  1. Get a token.
  2. Get metrics in Prometheus format.

1. Get a token

The token provides access to metrics for all clusters in a project in a single pool.

  1. In the control panel, on the top menu click Products and select Managed Databases.

  2. Open the Active tab.

  3. Open the cluster page → Monitoring tab.

  4. In the Tokens for Prometheus block, click Create token. The token will be generated automatically.

  5. copy the token. To do this, in the token row, click .

2. Get metrics in Prometheus format

  1. Add to the Prometheus configuration file:

    scrape_configs:
    - job_name: get-metrics-from-dbaas
    scrape_interval: 1m
    static_configs:
    - targets:
    - '<domain>'
    scheme: https
    authorization:
    type: Bearer
    credentials: <token>

    Specify:

    • <domain> — Managed Databases API domain. This is the part of the URL for accessing the API without https:// and /v1, for example ru-3.dbaas.selcloud.ru. The URL depends on the region and pool, which can be viewed in the list of URLs;
    • <token> — the token you copied when getting the token in step 5.
  2. Open the page in your browser where Prometheus-format metrics will be available:

    http://<ip_address>:9090/targets

    Specify <ip_address> — the IP address where Prometheus is installed.

  3. Set up monitoring and alerts for database clusters yourself.

Metrics in Prometheus format

metrics in Prometheus format are provided for all clusters. A specific cluster can be found by the database cluster ID in the ds_id label.

dbaas_memory_percentMemory used excluding operating system cache and buffers (RAM), in percent
dbaas_memory_bytesMemory used excluding operating system cache and buffers (RAM), in bytes
dbaas_oom_countnumber of processes that finished with an Out of Memory error due to lack of RAM
dbaas_cpuvCPU usage on database cluster nodes, in percent
dbaas_cpu_iowaitI/O wait time, in percent
dbaas_disk_percent

Occupied disk space in percent. This accounts for the portion of disk space reserved for service needs and unavailable for database placement. For more information about reserving disk space, see the instructions Using disk space in a Kafka cluster

dbaas_disk_bytes

Occupied disk space in bytes. This accounts for the portion of disk space reserved for service needs and unavailable for database placement. For more information about reserving disk space, see the instructions Using disk space in a Kafka cluster

dbaas_disk_read_iopsNumber of read operations per second
dbaas_disk_write_iopsNumber of write operations per second
dbaas_disk_read_bytesDisk read speed, in bytes per second
dbaas_disk_write_bytesDisk write speed, in bytes per second
dbaas_node_load1Average system load over one minute. Shows how many processes are being handled by cluster cores
dbaas_node_load5Average system load over five minutes. Shows how many processes are being handled by cluster cores
dbaas_node_load15Average system load over 15 minutes. Shows how many processes are being handled by cluster cores
dbaas_network_receive_bytesNumber of bytes received via the network interface
dbaas_network_transmit_bytesNumber of bytes sent via the network interface
dbaas_network_receive_packetsNumber of packets received via the network interface per second
dbaas_network_transmit_packetsNumber of packets sent via the network interface per second
dbaas_role

Node role:

  • 0 — role unknown; ;
  • 1 — master; ;
  • 2 — replica