F.33. pg_archive#

F.33. pg_archive

F.33. pg_archive #

pg_archive is an extension to automatically archive historical data from partitioned tables by converting partitions to storage methods that are more suitable for massive amounts of cold, read-only or read-mostly data compared to regular PostgreSQL tables. Currently, the only supported archival storage is the columnar access method maintained by the Hydra project, but other methods may be added in the future.

The code is loosely based on the excellent pg_partman project.

Automatic maintenance to archive partitions based on user-provided policies is implemented with a background worker that periodically executes the pg_archive_run_maintenance_proc() stored procedure. The procedure examines partitioned tables managed by pg_archive and converts partitions containing data older than a time interval specified with per-table policies. pg_archive_run_maintenance_proc() can also be invoked manually by user when an unscheduled maintenance is required.

F.33.1. Installation #

Requirements:

F.33.1.1. Building From Source #

In the root source directory run:

make install

F.33.1.2. Setup #

The background worker must be loaded on database start by adding the library to shared_preload_libraries in postgresql.conf

shared_preload_libraries = 'pg_archive_bgw'       # (change requires a restart)

Other GUCs that must be configured before using the extension are interval which is a time interval in seconds between maintenance runs by the background worker, dbnamewhich a CSV lines of database(s) for maintenance and role which is a role to be used by the worker: Here’s an example:

pg_archive_bgw.interval = 3600
pg_archive_bgw.role = 'alex'
pg_archive_bgw.dbname = 'mydb'

The role must have execute permissions on pg_archive_run_maintenance_proc().

Installing the extension after starting the server with the necessary GUCs configured:

CREATE SCHEMA archive;
CREATE EXTENSION pg_archive SCHEMA archive;

F.33.2. Upgrade #

ALTER EXTENSION pg_archive UPDATE TO '<latest version>';

F.33.3. Usage #

Partitioned tables can be configured to be managed by pg_archive with the pg_archive_manage() function.

This and the following examples in this section assume pg_archive is installed in the archive schema and the tables are in the monitoring schema:

SELECT archive.pg_archive_manage('monitoring.metrics', 'metric_timestamp', '1 year');
 pg_archive_manage
-------------------
 t
(1 row)

The above call adds a record into the archive.pg_archive_config table that will later be used by pg_archive_run_maintenance_proc() to archive and optionally detach partitions with data older than 1 year in the metrics table in the monitoring schema based on the metrics_timestamp value.

When a table becomes managed by pg_archive, its partitions will be examined and possibly archived on the next scheduled call to pg_archive_run_maintenance_proc() by the background worker, which is configured to run ever 3600 seconds by default. Alternatively, you may call pg_archive_run_maintenance_proc() manually:

CALL archive.pg_archive_run_maintenance_proc();

F.33.3.1. Examples #

First, create extension. It requires pg_columnar installed too, so use CASCADE.

CREATE EXTENSION pg_archive CASCADE;

After, you may want to enable background worker to archive tables in background. This requires configuration file changes.

shared_preload_libraries = 'pg_archive_bgw'
pg_archive_bgw.dbname = 'postgres'
pg_archive_bgw.role = 'postgres'
pg_archive_bgw.interval = 3600

Changes:

  • shared_preload_libraries - background worker must be loaded at server startup to be registered

  • pg_archive_bgw.dbname - set to postgres as default database

  • role and interval - these are default values, but you may want to change them

    • run as postgres user (in production create separate role for background worker, not superuser)

    • run every hour (3600 seconds)

Further examples follow. All operations done in postgres database and public schema. After each example, given table must be registered by pg_archive for further archiving. But if you want maintenance to run immediate, run procedure:

CALL pg_archive_run_maintenance_proc();

Or, you may want to force convert partition. This can be done with another procedure (note: it accepts partition table name - not parent):

CALL pg_archive_convert_partition_proc('public.partition_tbl_name');

To check, that tables archived you may run this query:

SELECT c.relname, am.amname
FROM pg_class c
JOIN pg_am am ON c.relam = am.oid
WHERE c.relname LIKE 'tbl_name%';

'tbl_name%' in query must be substituted with real table name. Suppose, that partitioned tables has name as parent table plus suffix at the end.

Feel free to change partition bounds or max age (parameter to pg_archive_manage) to check that tables are converting.

Table for temperature measurements partitioned by timestamp of measurement. It is used primarily for analytics, so archive it often - set max age to 1 day.

CREATE TABLE measurements(
    temp numeric NOT NULL,
    ts timestamp NOT NULL
) PARTITION BY RANGE(ts);
CREATE TABLE measurements_20240901
    PARTITION OF measurements
    FOR VALUES FROM ('2024-09-01') TO ('2024-09-02');
CREATE TABLE measurements_20240902
    PARTITION OF measurements
    FOR VALUES FROM ('2024-09-02') TO ('2024-09-03');
CREATE TABLE measurements_20240903
    PARTITION OF measurements
    FOR VALUES FROM ('2024-09-03') TO ('2024-09-04');
SELECT pg_archive_manage('public.measurements', 'ts', '1 day');

Store customers’ bookings in partitioned table. Bookings can be altered any time, so partitions split by months.

CREATE TABLE bookings(
    customer_id int NOT NULL,
    book_date date NOT NULL
) PARTITION BY RANGE(book_date);
CREATE TABLE bookings_20240901
    PARTITION OF bookings
    FOR VALUES FROM ('2024-09-01') TO ('2024-09-02');
CREATE TABLE bookings_20240902
    PARTITION OF bookings
    FOR VALUES FROM ('2024-09-02') TO ('2024-09-03');
CREATE TABLE bookings_20240903
    PARTITION OF bookings
    FOR VALUES FROM ('2024-09-03') TO ('2024-09-04');
SELECT pg_archive_manage('public.bookings', 'book_date', '1 year');

Use partitioned table to store logs of microservices. Timestamps of logs provided as UNIX timestamp (in seconds).

CREATE TABLE logs(
    message text NOT NULL,
    service int NOT NULL,
    ts numeric NOT NULL
) PARTITION BY RANGE(ts);
CREATE TABLE logs_202409
    PARTITION OF logs
    FOR VALUES FROM (1725148800) TO (1727740800);
CREATE TABLE logs_202410
    PARTITION OF logs
    FOR VALUES FROM (1727740800) TO (1730419200);
CREATE TABLE logs_202411
    PARTITION OF logs
    FOR VALUES FROM (1730419200) TO (1733011200);
SELECT pg_archive_manage('public.logs', 'ts', '1 month', p_epoch := 'seconds');

F.33.4. Functions, Procedures And Tables #

Below is a description of the available functions, their arguments and the columns in the pg_archive_config table:

F.33.4.1. pg_archive_manage() #

CREATE FUNCTION @extschema@.pg_archive_manage(
    p_parent_table text
    , p_control text
    , p_max_age text
    , p_type text DEFAULT 'range'
    , p_epoch text DEFAULT 'none'
    , p_automatic_maintenance boolean DEFAULT true
)

The function registers a partition table for management by pg_archive.

  • p_parent_table

    • an existing partitioned table. MUST be schema-qualified, even if in the public schema

  • p_control

    • name of a column that the partitioning will be based on. Must be of either a time- or integer-based type

  • p_max_age

    • either a time interval in the PostgreSQL INTERVAL type format, or an integer value, depending on partitioning type. Regardless of the partitioning type used, this value must be specified as text.

  • p_type

    • type of partitioning used by the table. Currently only range and list are supported. See the p_max_age parameter for special conditions related to type.

  • p_epoch

    • tells pg_archive that the control column is an integer type, but actually represents and epoch time value. Valid values for this option are: 'seconds', 'milliseconds', 'nanoseconds', and 'none'. The default is 'none'. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column (to_timestamp(controlcolumn)) as well so this works efficiently.

  • p_automatic_maintenance

    • indicates whether maintenance is managed automatically when pg_archive_run_maintenance_proc() is called without a table parameter or by the background worker process. Default is true. When set to false, pg_archive_run_maintenance_proc() can still be called on an individual partition set by passing it as a parameter to the function.

F.33.4.2. pg_archive_convert_partition() #

CREATE PROCEDURE @extschema@.pg_archive_convert_partition_proc(
    p_partition text
    , p_access_method text DEFAULT 'columnar'
    , p_keep_indexes boolean DEFAULT true
    , p_reattach boolean DEFAULT false
)

The function may be useful when an individual partition needs be archived manually . Note that you need to call pg_archive_manage() for the corresponding parent table before its partitioned can be passed to pg_archive_convert_partition().

  • p_partition

    • partition specification in the schema.name format.

  • p_access_method

    • access method to use for the archived table. Currently columnar (the default) is the only supported method.

  • p_keep_indexes

    • boolean value indicating whether indexes should be preserved in the archived table

    • default is true

  • p_reattach

    • boolean value indicating whether the archived table should be attached back to the parent table

    • default is true

F.33.4.3. pg_archive_config #

CREATE TABLE @extschema@.pg_archive_config (
    parent_table text NOT NULL
    , control text NOT NULL
    , partition_type text NOT NULL
    , automatic_maintenance boolean NOT NULL DEFAULT true
    , max_age text NOT NULL
    , new_schema text
    , keep_indexes boolean NOT NULL DEFAULT true
    , reattach boolean NOT NULL DEFAULT true
    , access_method text NOT NULL DEFAULT 'columnar'
    , epoch text NOT NULL DEFAULT 'none'
    , maintenance_order int
    , keep_publication boolean NOT NULL DEFAULT false
    , maintenance_last_run timestamptz
    , CONSTRAINT pg_archive_config_parent_table_pkey PRIMARY KEY (parent_table)
);
  • parent_table

    • parent table of a partition set to manage

  • control

    • column used as the control for partition constraints. Must be a time or integer based column.

  • partition_type

    • type of partitioning. Must be one of the types mentioned above in the pg_archive_manage() documentation.

  • automatic_maintenance

    • flag indicating whether maintenance is managed automatically when pg_archive_run_maintenance_proc() is called without a table parameter or by the background worker process.

    • Default is true.

    • When set to false, pg_archive_run_maintenance_proc() can still be called on an in individual partition set by passing it as a parameter to the function.

  • max_age

    • text type value that determines how old the data in a child partition can be before the partition is archived.

  • new_schema

    • schema to move archived partitions to as part of the archiving process. NULL (the default) means partitions do not change their schema after archiving.

  • keep_indexes

    • boolean value indicating whether indexes should be dropped for archived tables.

    • Default is true. Setting it to false means pg_archive will not attempt to recreate any indexes on the archived table. Note that if the table is configured to reattach archived partitions (see reattach below), PostgreSQL may automatically recreate indexes inherited from the parent table. See PostgreSQL documentation for ALTER TABLE ATTACH PARTITION for more information.

  • reattach

    • boolean value indicating whether partitions should stay attached to the parent table after conversion. Setting it to ‘false’ means that partitions will continue to exist as standalone tables after archiving, i.e. will be detached from the parent table.

    • default is true

  • access_method

    • access method to use to convert partitions during archiving

    • the default and the only currently supported value is columnar

  • epoch

    • indicates that the table is partitioned by time by an integer epoch value instead of a timestamp. See the description of the corresponding argument in the pg_archive_manage() function for more info.

    • default is 'none'.

  • maintenance_order

    • integer value that determines the order that maintenance will run the partition sets. Will run sets in increasing numerical order.

    • default value is NULL. All partition sets set to NULL will run after partition sets with a value defined. NULL partition sets run in an indeterminate order.

    • for sub-partitioned sets, the child tables by default inherit the order of their parents. Child parent tables will run in logical order when their parent table’s maintenance is run if left to the default value.

  • keep_publication

    • determines whether to drop archived partitions from any publications that they may be a member of.

    • default is false meaning that by archived partitions are removed from their publications.

  • maintenance_last_run

    • timestamp of the last successful run of maintenance for this partition set. Can be useful as a monitoring metric to ensure partition maintenance is running properly.

F.33.5. Caveats #

As part of the archiving procedure, pg_archive must copy partition data into a new table and then exchange the existing partition with the new archived version (unless reattach is configured to false for that table). Since ALTER TABLE ... EXCHANGE PARITION is currently unsupported by PostgreSQL, it boils down to either DROP TABLE or ALTER TABLE DETACH PARTITION followed by ALTER TABLE ATTACH PARTITION.

Both DROP TABLE for a partition and DETACH PARTITION take an ACCESS EXCLUSIVE lock on the parent table which is problematic for busy tables. Even though PostgreSQL 14 introduced DETACH PARTITION CONCURRENTLY to solve that particular problem, that functionality is unavailable to extensions, because ALTER TABLE DETACH PARTITION CONCURRENTLY must be a top-level command and cannot be used in stored procedures. See this pgsql-general discussion for more information.

As a result, pg_archive currently uses DROP TABLE to detach a partition as the only viable option. We are aware of the issues it may cause for busy, high-churn tables, and are looking for possible workarounds.