F.33. pg_archive#
F.33. pg_archive #
pg_archive is an extension to automatically archive historical data from partitioned tables by converting partitions to storage methods that are more suitable for massive amounts of cold, read-only or read-mostly data compared to regular PostgreSQL tables. Currently, the only supported archival storage is the columnar access method maintained by the Hydra project, but other methods may be added in the future.
The code is loosely based on the excellent pg_partman project.
Automatic maintenance to archive partitions based on user-provided
policies is implemented with a background worker that periodically
executes the pg_archive_run_maintenance_proc()
stored procedure. The procedure examines partitioned tables managed
by pg_archive and converts partitions containing data older than a
time interval specified with per-table policies.
pg_archive_run_maintenance_proc()
can also be
invoked manually by user when an unscheduled maintenance is
required.
F.33.1. Installation #
Requirements:
F.33.1.2. Setup #
The background worker must be loaded on database start by adding
the library to shared_preload_libraries
in
postgresql.conf
shared_preload_libraries = 'pg_archive_bgw' # (change requires a restart)
Other GUCs that must be configured before using the extension
are interval
which is a time interval in
seconds between maintenance runs by the background worker,
dbname
which a CSV lines of database(s) for
maintenance and role
which is a role to be
used by the worker: Here’s an example:
pg_archive_bgw.interval = 3600 pg_archive_bgw.role = 'alex' pg_archive_bgw.dbname = 'mydb'
The role must have execute permissions on
pg_archive_run_maintenance_proc()
.
Installing the extension after starting the server with the necessary GUCs configured:
CREATE SCHEMA archive; CREATE EXTENSION pg_archive SCHEMA archive;
F.33.2. Upgrade #
ALTER EXTENSION pg_archive UPDATE TO '<latest version>';
F.33.3. Usage #
Partitioned tables can be configured to be managed by pg_archive
with the pg_archive_manage()
function.
This and the following examples in this section assume pg_archive
is installed in the archive
schema and the
tables are in the monitoring
schema:
SELECT archive.pg_archive_manage('monitoring.metrics', 'metric_timestamp', '1 year'); pg_archive_manage ------------------- t (1 row)
The above call adds a record into the
archive.pg_archive_config
table that will later
be used by pg_archive_run_maintenance_proc()
to
archive and optionally detach partitions with data older than 1
year in the metrics
table in the
monitoring
schema based on the
metrics_timestamp
value.
When a table becomes managed by pg_archive, its partitions will be
examined and possibly archived on the next scheduled call to
pg_archive_run_maintenance_proc()
by the
background worker, which is configured to run ever 3600 seconds by
default. Alternatively, you may call
pg_archive_run_maintenance_proc()
manually:
CALL archive.pg_archive_run_maintenance_proc();
F.33.3.1. Examples #
First, create extension. It requires
pg_columnar
installed too, so use
CASCADE
.
CREATE EXTENSION pg_archive CASCADE;
After, you may want to enable background worker to archive tables in background. This requires configuration file changes.
shared_preload_libraries = 'pg_archive_bgw' pg_archive_bgw.dbname = 'postgres' pg_archive_bgw.role = 'postgres' pg_archive_bgw.interval = 3600
Changes:
shared_preload_libraries
- background worker must be loaded at server startup to be registeredpg_archive_bgw.dbname
- set topostgres
as default databaserole
andinterval
- these are default values, but you may want to change themrun as
postgres
user (in production create separate role for background worker, not superuser)run every hour (3600 seconds)
Further examples follow. All operations done in
postgres
database and
public
schema. After each example, given
table must be registered by pg_archive for further archiving.
But if you want maintenance to run immediate, run procedure:
CALL pg_archive_run_maintenance_proc();
Or, you may want to force convert partition. This can be done with another procedure (note: it accepts partition table name - not parent):
CALL pg_archive_convert_partition_proc('public.partition_tbl_name');
To check, that tables archived you may run this query:
SELECT c.relname, am.amname FROM pg_class c JOIN pg_am am ON c.relam = am.oid WHERE c.relname LIKE 'tbl_name%';
'tbl_name%'
in query must be substituted with real table name. Suppose, that partitioned tables has name as parent table plus suffix at the end.
Feel free to change partition bounds or max age (parameter to
pg_archive_manage
) to check that tables are
converting.
Table for temperature measurements partitioned by timestamp of measurement. It is used primarily for analytics, so archive it often - set max age to 1 day.
CREATE TABLE measurements( temp numeric NOT NULL, ts timestamp NOT NULL ) PARTITION BY RANGE(ts); CREATE TABLE measurements_20240901 PARTITION OF measurements FOR VALUES FROM ('2024-09-01') TO ('2024-09-02'); CREATE TABLE measurements_20240902 PARTITION OF measurements FOR VALUES FROM ('2024-09-02') TO ('2024-09-03'); CREATE TABLE measurements_20240903 PARTITION OF measurements FOR VALUES FROM ('2024-09-03') TO ('2024-09-04'); SELECT pg_archive_manage('public.measurements', 'ts', '1 day');
Store customers’ bookings in partitioned table. Bookings can be altered any time, so partitions split by months.
CREATE TABLE bookings( customer_id int NOT NULL, book_date date NOT NULL ) PARTITION BY RANGE(book_date); CREATE TABLE bookings_20240901 PARTITION OF bookings FOR VALUES FROM ('2024-09-01') TO ('2024-09-02'); CREATE TABLE bookings_20240902 PARTITION OF bookings FOR VALUES FROM ('2024-09-02') TO ('2024-09-03'); CREATE TABLE bookings_20240903 PARTITION OF bookings FOR VALUES FROM ('2024-09-03') TO ('2024-09-04'); SELECT pg_archive_manage('public.bookings', 'book_date', '1 year');
Use partitioned table to store logs of microservices. Timestamps of logs provided as UNIX timestamp (in seconds).
CREATE TABLE logs( message text NOT NULL, service int NOT NULL, ts numeric NOT NULL ) PARTITION BY RANGE(ts); CREATE TABLE logs_202409 PARTITION OF logs FOR VALUES FROM (1725148800) TO (1727740800); CREATE TABLE logs_202410 PARTITION OF logs FOR VALUES FROM (1727740800) TO (1730419200); CREATE TABLE logs_202411 PARTITION OF logs FOR VALUES FROM (1730419200) TO (1733011200); SELECT pg_archive_manage('public.logs', 'ts', '1 month', p_epoch := 'seconds');
F.33.4. Functions, Procedures And Tables #
Below is a description of the available functions, their arguments
and the columns in the pg_archive_config
table:
F.33.4.1. pg_archive_manage()
#
CREATE FUNCTION @extschema@.pg_archive_manage( p_parent_table text , p_control text , p_max_age text , p_type text DEFAULT 'range' , p_epoch text DEFAULT 'none' , p_automatic_maintenance boolean DEFAULT true )
The function registers a partition table for management by pg_archive.
p_parent_table
an existing partitioned table. MUST be schema-qualified, even if in the public schema
p_control
name of a column that the partitioning will be based on. Must be of either a time- or integer-based type
p_max_age
either a time interval in the PostgreSQL
INTERVAL
type format, or an integer value, depending on partitioning type. Regardless of the partitioning type used, this value must be specified as text.
p_type
type of partitioning used by the table. Currently only range and list are supported. See the
p_max_age
parameter for special conditions related to type.
p_epoch
tells
pg_archive
that the control column is an integer type, but actually represents and epoch time value. Valid values for this option are:'seconds'
,'milliseconds'
,'nanoseconds'
, and'none'
. The default is'none'
. In addition to a normal index on the control column, be sure you create a functional, time-based index on the control column(to_timestamp(controlcolumn))
as well so this works efficiently.
p_automatic_maintenance
indicates whether maintenance is managed automatically when
pg_archive_run_maintenance_proc()
is called without a table parameter or by the background worker process. Default istrue
. When set tofalse
,pg_archive_run_maintenance_proc()
can still be called on an individual partition set by passing it as a parameter to the function.
F.33.4.2. pg_archive_convert_partition()
#
CREATE PROCEDURE @extschema@.pg_archive_convert_partition_proc( p_partition text , p_access_method text DEFAULT 'columnar' , p_keep_indexes boolean DEFAULT true , p_reattach boolean DEFAULT false )
The function may be useful when an individual partition needs be
archived manually . Note that you need to call
pg_archive_manage()
for the corresponding
parent table before its partitioned can be passed to
pg_archive_convert_partition()
.
p_partition
partition specification in the
schema.name
format.
p_access_method
access method to use for the archived table. Currently
columnar
(the default) is the only supported method.
p_keep_indexes
boolean value indicating whether indexes should be preserved in the archived table
default is
true
p_reattach
boolean value indicating whether the archived table should be attached back to the parent table
default is
true
F.33.4.3. pg_archive_config
#
CREATE TABLE @extschema@.pg_archive_config ( parent_table text NOT NULL , control text NOT NULL , partition_type text NOT NULL , automatic_maintenance boolean NOT NULL DEFAULT true , max_age text NOT NULL , new_schema text , keep_indexes boolean NOT NULL DEFAULT true , reattach boolean NOT NULL DEFAULT true , access_method text NOT NULL DEFAULT 'columnar' , epoch text NOT NULL DEFAULT 'none' , maintenance_order int , keep_publication boolean NOT NULL DEFAULT false , maintenance_last_run timestamptz , CONSTRAINT pg_archive_config_parent_table_pkey PRIMARY KEY (parent_table) );
parent_table
parent table of a partition set to manage
control
column used as the control for partition constraints. Must be a time or integer based column.
partition_type
type of partitioning. Must be one of the types mentioned above in the
pg_archive_manage()
documentation.
automatic_maintenance
flag indicating whether maintenance is managed automatically when
pg_archive_run_maintenance_proc()
is called without a table parameter or by the background worker process.Default is
true
.When set to
false
,pg_archive_run_maintenance_proc()
can still be called on an in individual partition set by passing it as a parameter to the function.
max_age
text type value that determines how old the data in a child partition can be before the partition is archived.
new_schema
schema to move archived partitions to as part of the archiving process.
NULL
(the default) means partitions do not change their schema after archiving.
keep_indexes
boolean value indicating whether indexes should be dropped for archived tables.
Default is
true
. Setting it tofalse
means pg_archive will not attempt to recreate any indexes on the archived table. Note that if the table is configured to reattach archived partitions (seereattach
below), PostgreSQL may automatically recreate indexes inherited from the parent table. See PostgreSQL documentation forALTER TABLE ATTACH PARTITION
for more information.
reattach
boolean value indicating whether partitions should stay attached to the parent table after conversion. Setting it to ‘false’ means that partitions will continue to exist as standalone tables after archiving, i.e. will be detached from the parent table.
default is
true
access_method
access method to use to convert partitions during archiving
the default and the only currently supported value is
columnar
epoch
indicates that the table is partitioned by time by an integer epoch value instead of a timestamp. See the description of the corresponding argument in the
pg_archive_manage()
function for more info.default is
'none'
.
maintenance_order
integer value that determines the order that maintenance will run the partition sets. Will run sets in increasing numerical order.
default value is
NULL
. All partition sets set to NULL will run after partition sets with a value defined.NULL
partition sets run in an indeterminate order.for sub-partitioned sets, the child tables by default inherit the order of their parents. Child parent tables will run in logical order when their parent table’s maintenance is run if left to the default value.
keep_publication
determines whether to drop archived partitions from any publications that they may be a member of.
default is
false
meaning that by archived partitions are removed from their publications.
maintenance_last_run
timestamp of the last successful run of maintenance for this partition set. Can be useful as a monitoring metric to ensure partition maintenance is running properly.
F.33.5. Caveats #
As part of the archiving procedure, pg_archive must copy partition
data into a new table and then exchange the existing partition
with the new archived version (unless reattach
is configured to false
for that table). Since
ALTER TABLE ... EXCHANGE PARITION
is currently
unsupported by PostgreSQL, it boils down to either
DROP TABLE
or
ALTER TABLE DETACH PARTITION
followed by
ALTER TABLE ATTACH PARTITION
.
Both DROP TABLE
for a partition and
DETACH PARTITION
take an
ACCESS EXCLUSIVE
lock on the parent table which
is problematic for busy tables. Even though PostgreSQL 14
introduced DETACH PARTITION CONCURRENTLY
to
solve that particular problem, that functionality is unavailable
to extensions, because
ALTER TABLE DETACH PARTITION CONCURRENTLY
must
be a top-level command and cannot be used in stored procedures.
See
this
pgsql-general discussion for more information.
As a result, pg_archive currently uses
DROP TABLE
to detach a partition as the only
viable option. We are aware of the issues it may cause for busy,
high-churn tables, and are looking for possible workarounds.