F.38. pg_archive — automatically archive historical data from partitioned tables#
F.38. pg_archive — automatically archive historical data from partitioned tables #
F.38.1. Overview #
pg_archive is an extension to automatically archive historical data from partitioned tables by converting partitions to storage methods that are more suitable for massive amounts of cold, read-only or read-mostly data compared to regular PostgreSQL tables. Currently, the only supported archival storage is the columnar access method maintained by the Hydra project, but other methods may be added in the future.
The code is loosely based on the excellent pg_partman project.
Automatic maintenance to archive partitions based on user-provided
policies is implemented with a background worker that periodically
executes the pg_archive_run_maintenance_proc()
stored procedure. The procedure examines partitioned tables managed
by pg_archive and converts partitions containing data older than a
time interval specified with per-table policies.
pg_archive_run_maintenance_proc() can also be
invoked manually by user when an unscheduled maintenance is
required.
F.38.2. Installation #
Requirements:
It requires
pg_columnar (hydra) installed too, so use
CASCADE.
CREATE EXTENSION pg_archive CASCADE;
F.38.2.1. Setup #
The background worker must be loaded on database start by adding
the library to shared_preload_libraries in
postgresql.conf
shared_preload_libraries = 'pg_archive_bgw' # (change requires a restart)
Other GUCs that must be configured before using the extension
are interval which is a time interval in
seconds between maintenance runs by the background worker,
dbnamewhich a CSV lines of database(s) for
maintenance and role which is a role to be
used by the worker: Here’s an example:
pg_archive_bgw.interval = 3600 pg_archive_bgw.role = 'alex' pg_archive_bgw.dbname = 'mydb'
The role must have execute permissions on
pg_archive_run_maintenance_proc().
Installing the extension after starting the server with the necessary GUCs configured:
CREATE SCHEMA archive; CREATE EXTENSION pg_archive SCHEMA archive;
F.38.3. Upgrade #
ALTER EXTENSION pg_archive UPDATE TO '<latest version>';
F.38.4. Usage #
Partitioned tables can be configured to be managed by pg_archive
with the pg_archive_manage() function.
This and the following examples in this section assume pg_archive
is installed in the archive schema and the
tables are in the monitoring schema:
SELECT archive.pg_archive_manage('monitoring.metrics', 'metric_timestamp', '1 year');
pg_archive_manage
-------------------
t
(1 row)
The above call adds a record into the
archive.pg_archive_config table that will later
be used by pg_archive_run_maintenance_proc() to
archive and optionally detach partitions with data older than 1
year in the metrics table in the
monitoring schema based on the
metrics_timestamp value.
When a table becomes managed by pg_archive, its partitions will be
examined and possibly archived on the next scheduled call to
pg_archive_run_maintenance_proc() by the
background worker, which is configured to run ever 3600 seconds by
default. Alternatively, you may call
pg_archive_run_maintenance_proc() manually:
CALL archive.pg_archive_run_maintenance_proc();
F.38.4.1. Examples #
First, create extension. It requires
pg_columnar installed too, so use
CASCADE.
CREATE EXTENSION pg_archive CASCADE;
After, you may want to enable background worker to archive tables in background. This requires configuration file changes.
shared_preload_libraries = 'pg_archive_bgw' pg_archive_bgw.dbname = 'postgres' pg_archive_bgw.role = 'postgres' pg_archive_bgw.interval = 3600
Changes:
shared_preload_libraries- background worker must be loaded at server startup to be registeredpg_archive_bgw.dbname- set topostgresas default databaseroleandinterval- these are default values, but you may want to change themrun as
postgresuser (in production create separate role for background worker, not superuser)run every hour (3600 seconds)
Further examples follow. All operations done in
postgres database and
public schema. After each example, given
table must be registered by pg_archive for further archiving.
But if you want maintenance to run immediate, run procedure:
CALL pg_archive_run_maintenance_proc();
Or, you may want to force convert partition. This can be done with another procedure (note: it accepts partition table name - not parent):
CALL pg_archive_convert_partition_proc('public.partition_tbl_name');
To check, that tables archived you may run this query:
SELECT c.relname, am.amname FROM pg_class c JOIN pg_am am ON c.relam = am.oid WHERE c.relname LIKE 'tbl_name%';
Note
'tbl_name%' in query must be substituted
with real table name. Suppose, that partitioned tables has
name as parent table plus suffix at the end.
Feel free to change partition bounds or max age (parameter to
pg_archive_manage) to check that tables are
converting.
Table for temperature measurements partitioned by timestamp of measurement. It is used primarily for analytics, so archive it often - set max age to 1 day.
CREATE TABLE measurements(
temp numeric NOT NULL,
ts timestamp NOT NULL
) PARTITION BY RANGE(ts);
CREATE TABLE measurements_20240901
PARTITION OF measurements
FOR VALUES FROM ('2024-09-01') TO ('2024-09-02');
CREATE TABLE measurements_20240902
PARTITION OF measurements
FOR VALUES FROM ('2024-09-02') TO ('2024-09-03');
CREATE TABLE measurements_20240903
PARTITION OF measurements
FOR VALUES FROM ('2024-09-03') TO ('2024-09-04');
SELECT pg_archive_manage('public.measurements', 'ts', '1 day');
Store customers’ bookings in partitioned table. Bookings can be altered any time, so partitions split by months.
CREATE TABLE bookings(
customer_id int NOT NULL,
book_date date NOT NULL
) PARTITION BY RANGE(book_date);
CREATE TABLE bookings_20240901
PARTITION OF bookings
FOR VALUES FROM ('2024-09-01') TO ('2024-09-02');
CREATE TABLE bookings_20240902
PARTITION OF bookings
FOR VALUES FROM ('2024-09-02') TO ('2024-09-03');
CREATE TABLE bookings_20240903
PARTITION OF bookings
FOR VALUES FROM ('2024-09-03') TO ('2024-09-04');
SELECT pg_archive_manage('public.bookings', 'book_date', '1 year');
Use partitioned table to store logs of microservices. Timestamps of logs provided as UNIX timestamp (in seconds).
CREATE TABLE logs(
message text NOT NULL,
service int NOT NULL,
ts numeric NOT NULL
) PARTITION BY RANGE(ts);
CREATE TABLE logs_202409
PARTITION OF logs
FOR VALUES FROM (1725148800) TO (1727740800);
CREATE TABLE logs_202410
PARTITION OF logs
FOR VALUES FROM (1727740800) TO (1730419200);
CREATE TABLE logs_202411
PARTITION OF logs
FOR VALUES FROM (1730419200) TO (1733011200);
SELECT pg_archive_manage('public.logs', 'ts', '1 month', p_epoch := 'seconds');
F.38.5. Functions, Procedures And Tables #
Below is a description of the available functions, their arguments
and the columns in the pg_archive_config table:
F.38.5.1. pg_archive_manage() #
CREATE FUNCTION @extschema@.pg_archive_manage(
p_parent_table text
, p_control text
, p_max_age text
, p_type text DEFAULT 'range'
, p_epoch text DEFAULT 'none'
, p_automatic_maintenance boolean DEFAULT true
)
The function registers a partition table for management by pg_archive.
p_parent_tablean existing partitioned table. Must be schema-qualified, even if in the public schema
p_controlname of a column that the partitioning will be based on. Must be of either a time- or integer-based type
p_max_ageeither a time interval in the PostgreSQL
INTERVALtype format, or an integer value, depending on partitioning type. Regardless of the partitioning type used, this value must be specified as text.
p_typetype of partitioning used by the table. Currently only range and list are supported. See the
p_max_ageparameter for special conditions related to type.
p_epochtells
pg_archivethat the control column is an integer type, but actually represents and epoch time value. Valid values for this option are:'seconds','milliseconds','nanoseconds', and'none'. The default is'none'. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column(to_timestamp(controlcolumn))as well so this works efficiently.
p_automatic_maintenanceindicates whether maintenance is managed automatically when
pg_archive_run_maintenance_proc()is called without a table parameter or by the background worker process. Default istrue. When set tofalse,pg_archive_run_maintenance_proc()can still be called on an individual partition set by passing it as a parameter to the function.
F.38.5.2. pg_archive_convert_partition() #
CREATE PROCEDURE @extschema@.pg_archive_convert_partition_proc(
p_partition text
, p_access_method text DEFAULT 'columnar'
, p_keep_indexes boolean DEFAULT true
, p_reattach boolean DEFAULT false
)
The function may be useful when an individual partition needs be
archived manually . Note that you need to call
pg_archive_manage() for the corresponding
parent table before its partitioned can be passed to
pg_archive_convert_partition().
p_partitionpartition specification in the
schema.nameformat.
p_access_methodaccess method to use for the archived table. Currently
columnar(the default) is the only supported method.
p_keep_indexesboolean value indicating whether indexes should be preserved in the archived table
default is
true
p_reattachboolean value indicating whether the archived table should be attached back to the parent table
default is
true
F.38.5.3. pg_archive_config #
CREATE TABLE @extschema@.pg_archive_config (
parent_table text NOT NULL
, control text NOT NULL
, partition_type text NOT NULL
, automatic_maintenance boolean NOT NULL DEFAULT true
, max_age text NOT NULL
, new_schema text
, keep_indexes boolean NOT NULL DEFAULT true
, reattach boolean NOT NULL DEFAULT true
, access_method text NOT NULL DEFAULT 'columnar'
, epoch text NOT NULL DEFAULT 'none'
, maintenance_order int
, keep_publication boolean NOT NULL DEFAULT false
, maintenance_last_run timestamptz
, CONSTRAINT pg_archive_config_parent_table_pkey PRIMARY KEY (parent_table)
);
parent_tableparent table of a partition set to manage
controlcolumn used as the control for partition constraints. Must be a time or integer based column.
partition_typetype of partitioning. Must be one of the types mentioned above in the
pg_archive_manage()documentation.
automatic_maintenanceflag indicating whether maintenance is managed automatically when
pg_archive_run_maintenance_proc()is called without a table parameter or by the background worker process.Default is
true.When set to
false,pg_archive_run_maintenance_proc()can still be called on an in individual partition set by passing it as a parameter to the function.
max_agetext type value that determines how old the data in a child partition can be before the partition is archived.
new_schemaschema to move archived partitions to as part of the archiving process.
NULL(the default) means partitions do not change their schema after archiving.
keep_indexesboolean value indicating whether indexes should be dropped for archived tables.
Default is
true. Setting it tofalsemeans pg_archive will not attempt to recreate any indexes on the archived table. Note that if the table is configured to reattach archived partitions (seereattachbelow), PostgreSQL may automatically recreate indexes inherited from the parent table. See PostgreSQL documentation forALTER TABLE ATTACH PARTITIONfor more information.
reattachboolean value indicating whether partitions should stay attached to the parent table after conversion. Setting it to ‘false’ means that partitions will continue to exist as standalone tables after archiving, i.e. will be detached from the parent table.
default is
true
access_methodaccess method to use to convert partitions during archiving
the default and the only currently supported value is
columnar
epochindicates that the table is partitioned by time by an integer epoch value instead of a timestamp. See the description of the corresponding argument in the
pg_archive_manage()function for more info.default is
'none'.
maintenance_orderinteger value that determines the order that maintenance will run the partition sets. Will run sets in increasing numerical order.
default value is
NULL. All partition sets set to NULL will run after partition sets with a value defined.NULLpartition sets run in an indeterminate order.for sub-partitioned sets, the child tables by default inherit the order of their parents. Child parent tables will run in logical order when their parent table’s maintenance is run if left to the default value.
keep_publicationdetermines whether to drop archived partitions from any publications that they may be a member of.
default is
falsemeaning that by archived partitions are removed from their publications.
maintenance_last_runtimestamp of the last successful run of maintenance for this partition set. Can be useful as a monitoring metric to ensure partition maintenance is running properly.
F.38.6. Caveats #
As part of the archiving procedure, pg_archive must copy partition
data into a new table and then exchange the existing partition
with the new archived version (unless reattach
is configured to false for that table). Since
ALTER TABLE ... EXCHANGE PARITION is currently
unsupported by PostgreSQL, it boils down to either
DROP TABLE or
ALTER TABLE DETACH PARTITION followed by
ALTER TABLE ATTACH PARTITION.
Both DROP TABLE for a partition and
DETACH PARTITION take an
ACCESS EXCLUSIVE lock on the parent table which
is problematic for busy tables. Even though PostgreSQL 14
introduced DETACH PARTITION CONCURRENTLY to
solve that particular problem, that functionality is unavailable
to extensions, because
ALTER TABLE DETACH PARTITION CONCURRENTLY must
be a top-level command and cannot be used in stored procedures.
As a result, pg_archive currently uses
DROP TABLE to detach a partition as the only
viable option. We are aware of the issues it may cause for busy,
high-churn tables, and are looking for possible workarounds.