Disk space usage in a PostgreSQL TimescaleDB cluster

The following data PostgreSQL TimescaleDB is stored on cluster disks:

temporary files;
data files;
transaction logs (WAL);
logical replication slots;
DBMS logs;
system files required for the operation of PostgreSQL TimescaleDB.

Part of the disk space is reserved for service needs:

for the file system — 4% of the disk volume;
for the operating system, service components, and logs — 8 GB.

The reserved portion of the disk space is not available to host databases. Take this into account when selecting a configuration lineup.

You can track disk usage using disk usage notifications and metrics. For more information about metrics, see Monitoring cluster, nodes, and databases PostgreSQL TimescaleDB.

As the amount of used disk space grows, you can check:

the size of temporary files;
consumers of replication slots;
dead tuples.

When cluster disk usage reaches 95% or more, the cluster will enter DISK_FULL status and will be read-only. This is necessary to prevent the cluster from locking up or becoming corrupted due to a lack of free space. To make the cluster read-write, clear the disk or rescale the cluster and select a configuration with a larger disk size than the previous one.

Disk space notifications

Disk usage notifications are sent to the email of the Account Owner and users who are subscribed to the Services and Products notification category. Notifications are sent when disk usage reaches 80% and 95%.

View the size of temporary files

Temporary files can be used for sorting, hashing, and temporary storage of query results. To view the total size of temporary files in a database, use an SQL query against the pg_stat_database view:

SELECT datname, temp_files AS "Temporary files", temp_bytes AS "Size of temporary files"
FROM   pg_stat_database;

Output example:

datname | temp_files   | temp_bytes
--------+--------------+----------------
mydb    |      2       |    16384
postgres|      1       |    8192

Where:

datname — database name;
temp_files — the number of temporary files in this database;
temp_bytes — the size of temporary files in bytes.

The temp_files and temp_bytes fields account for all temporary files since the cluster was created. The data is reset only after a recovery from a backup or an emergency shutdown. Use these field values to monitor changes in the total size of temporary files.

The size of temporary tables created by a specific query can be obtained using the EXPLAIN ANALYZE command.

Check logical replication slot consumers

Logical replication slots are used for continuous data replication from one database to another. Logical replication slots must always have a consumer. If there is no consumer, file size will grow.

Read more about managing logical replication slots in the Subscription article of the PostgreSQL documentation.

To check if logical replication slots have consumers, run an SQL query against the pg_replication_slots view:

SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(),restart_lsn)) AS replicationSlotLag,active
FROM pg_replication_slots;

Output example:

slot_name        | replicationslotlag | active
-----------------+--------------------+--------
myslot1          | 129 GB             | f
myslot2          | 704 MB             | t
myslot3          | 624 MB             | t

Where:

slot_name — logical replication slot name;
replicationslotlag — the size of WAL files that will not be automatically removed at checkpoints and that logical replication slot consumers may use;
active — a boolean value indicating whether the logical replication slot is in use:
- f — the slot has no consumer;
- t — the slot has a consumer.

If you are using PostgreSQL 13 or later, you can limit the maximum size of stored WAL files using the max_slot_wal_keep_size parameter. Note that when using this parameter, transaction logs may be deleted before the consumer reads changes from the logical replication slot.

Dead tuples

When you update (UPDATE) or delete (DELETE) rows in a table, the tuples are not actually removed from the disk; instead, new versions of them are created. The old versions of the tuples are called "dead tuples." This versioning is required to implement the MVCC (Multi-Version Concurrency Control) process. Even though rows are modified in one transaction, other active transactions may continue to see the old version of the rows. Dead tuples can only be deleted when all active transactions are closed.

Dead tuples will also be created on a replica if you use logical replication.

Check dead tuples

If there are many dead tuples, they may occupy a significant amount of disk space. To check the number of dead tuples, you can use the extension PostgreSQL TimescaleDB pgstattuple or the pg_stat_all_tables view.

Example SQL query for the pg_stat_all_tables view:

SELECT * FROM pg_stat_all_tables WHERE relname='test';

Output example:

-[ RECORD 1 ]----------+------------------------------
relid                  | 16395
schemaname             | public
relname                | test
seq_scan               | 3
seq_tup_read           | 5280041
idx_scan               |
idx_tup_fetch          |
n_tup_ins              | 2000000
n_tup_upd              | 0
n_tup_del              | 3639911
n_tup_hot_upd          | 0
n_live_tup             | 1635941
n_dead_tup             | 1999952
n_mod_since_analyze    | 3999952
last_vacuum            |
last_autovacuum        | 2023-02-16 04:49:52.399546+00
last_analyze           | 2023-02-09 09:44:56.208889+00
last_autoanalyze       | 2023-02-16 04:50:22.581935+00
vacuum_count           | 0
autovacuum_count       | 1
analyze_count          | 1
autoanalyze_count      | 1

Here n_dead_tup is the number of dead tuples.

Comparison of methods for removing dead tuples

To remove dead tuples, you can use the standard VACUUM, VACUUM FULL commands or repack tables and indexes using the pg_repack extension.

	VACUUM	VACUUM FULL	pg_repack
Clears the table of dead tuples	✓	✓	✓
Reduces file size	✗	✓	✓
Completely locks the table	✗	✓	Only for two short periods at the beginning and end of the extension's operation

Read more about VACUUM and VACUUM FULL commands in the Routine Vacuuming article of the PostgreSQL documentation.

Read more about the pg_repack extension and its functions in the pg_repack documentation.

Repack tables and indexes using pg_repack

warning

Run pg_repack during periods of minimum cluster load, as pg_repack creates additional load. Read more about monitoring cluster health in the Monitoring cluster, nodes, and databases PostgreSQL TimescaleDB.

To work with the pg_repack extension, the client of the same name is used. The client is installed on the host from which you connect to the PostgreSQL TimescaleDB cluster.

Add the pg_repack extension to the database.
Determine the version of the pg_repack extension:

2.1. Connect to the cluster. When connecting, specify the name of the database to which you added the pg_repack extension.

2.2. Determine the extension version:
```
SELECT
  extname AS extension,
  extversion AS version,
  extnamespace::regnamespace AS schema
FROM pg_extension
WHERE extname = 'pg_repack';
```
Download and install the pg_repack client of the same version as the extension.
Ensure there is enough free space on the disk. A full table repack requires free disk space approximately twice the size of the tables and indexes being processed. For example, if the total size of the tables and indexes being processed is 1 GB, an additional 2 GB of disk space will be required.
Ensure that the table being processed has a primary key (PRIMARY KEY) or a unique index (UNIQUE INDEX).
Repack tables and indexes in the database:
```
pg_repack -k -h <host> -p <port> \
  -U <user> \
  -d <database_name> \
  -t <table_name> \
  -i <index_name>
```
Specify:
- <host> — the DNS address of the node;
- <port> — connection port;
- <user> — the database user name;
- <database_name> — the database name;
- optional: -t <table_name>, where <table_name> is the table name. Use this parameter if you need to repack a specific table. To repack multiple tables, specify the required number of -t parameters — one for each table;
- optional: -i <index_name>, where <index_name> is the index name. Use this parameter if you need to repack a specific index. To repack multiple indexes, specify the required number of -i parameters — one for each index. If an index belongs to a table you have already specified via the -t parameter, you do not need to specify it separately via the -i parameter — the index will be automatically repacked along with the table.
If you do not specify the -t and -i parameters, pg_repack will repack all tables and indexes in the specified database.

Clear disk

For your information

We do not recommend using the DELETE FROM table WHERE ... query to clear the disk. This query may create large selections on large tables and place them on the disk. The remaining free space on the disk could run out completely, which will lead to issues with PostgreSQL TimescaleDB and the need to manually restore its operation.

Open a transaction with transaction_read_only = no and delete unnecessary data using one of the following queries:

DROP TABLE — completely deletes the table: data, structure, indexes, constraints (constraints), triggers.
```
BEGIN;
SET transaction_read_only = no;
DROP TABLE table_name;
COMMIT;
```
TRUNCATE TABLE — removes all rows from a table. It works faster than DELETE.
```
BEGIN;
SET transaction_read_only = no;
TRUNCATE TABLE table_name;
COMMIT;
```
DELETE — deletes rows matching the WHERE condition.