F.46. pg_ivm#
F.46. pg_ivm #
The pg_ivm
module provides Incremental View
Maintenance (IVM) feature for Tantor SE.
F.46.2. Description #
Incremental View Maintenance
(IVM) is a way to make materialized views up-to-date in
which only incremental changes are computed and applied on views
rather than recomputing the contents from scratch as
REFRESH MATERIALIZED VIEW
does. IVM can update
materialized views more efficiently than recomputation when only
small parts of the view are changed.
There are two approaches with regard to timing of view
maintenance: immediate and deferred. In immediate maintenance,
views are updated in the same transaction that its base table is
modified. In deferred maintenance, views are updated after the
transaction is committed, for example, when the view is accessed,
as a response to user command like
REFRESH MATERIALIZED VIEW
, or periodically in
background, and so on. pg_ivm
provides a kind
of immediate maintenance, in which materialized views are updated
immediately in AFTER triggers when a base table is modified.
We call a materialized view supporting IVM an
Incrementally Maintainable Materialized
View (IMMV). To create IMMV, you have to call
create_immv
function with a relation name and a
view definition query. For example:
SELECT pgivm.create_immv('myview', 'SELECT * FROM mytab');
creates an IMMV with name ‘myview’ defined as
SELECT * FROM mytab
. This is corresponding to
the following command to create a normal materialized view;
CREATE MATERIALIZED VIEW myview AS SELECT * FROM mytab;
When an IMMV is created, some triggers are automatically created so that the view’s contents are immediately updated when its base tables are modified.
postgres=# SELECT pgivm.create_immv('m', 'SELECT * FROM t0'); NOTICE: could not create an index on immv "m" automatically DETAIL: This target list does not have all the primary key columns, or this view does not contain DISTINCT clause. HINT: Create an index on the immv for efficient incremental maintenance. create_immv ------------- 3 (1 row) postgres=# SELECT * FROM m; i --- 1 2 3 (3 rows) postgres=# INSERT INTO t0 VALUES (4); INSERT 0 1 postgres=# SELECT * FROM m; -- automatically updated i --- 1 2 3 4 (4 rows)
Note
During automatic maintenance of an IMMV, the search_path
is
temporarily changed to pg_catalog, pg_temp
.
F.46.3. Installation #
Execute CREATE EXTENSION
comand.
CREATE EXTENSION pg_ivm;
F.46.3.1. Configuration #
To ensure pg_ivm
maintains IMMVs correctly,
add it to either shared_preload_libraries or
session_preload_libraries in postgresql.conf:
# Add pg_ivm to preload libraries shared_preload_libraries = 'pg_ivm' # OR session_preload_libraries = 'pg_ivm'
After making this change, restart PostgreSQL for the configuration to take effect.
F.46.4. Objects #
When pg_ivm
is installed, the ‘pgivm’ schema is
created, along with the following objects within this schema.
F.46.4.1. Functions #
F.46.4.1.1. create_immv #
Use create_immv
function to create IMMV.
pgivim.create_immv(immv_name text, view_definition text) RETURNS bigint
create_immv
defines a new IMMV of a query.
A table of the name immv_name
is created
and a query specified by view_definition
is
executed and used to populate the IMMV. The query is stored in
pg_ivm_immv
, so that it can be refreshed
later upon incremental view maintenance.
create_immv
returns the number of rows in
the created IMMV.
When an IMMV is created, some triggers are automatically created so that the view’s contents are immediately updated when its base tables are modified. In addition, a unique index is created on the IMMV automatically if possible. If the view definition query has a GROUP BY clause, a unique index is created on the columns of GROUP BY expressions. Also, if the view has DISTINCT clause, a unique index is created on all columns in the target list. Otherwise, if the IMMV contains all primary key attributes of its base tables in the target list, a unique index is created on these attributes. In other cases, no index is created.
create_immv
acquires an
AccessExclusiveLock
on the view. However,
even if we are able to acquire the lock, a concurrent
transaction may have already incrementally updated and
committed the view before we can acquire it. In the
REPEATABLE READ
or
SERIALIZABLE
isolation levels, this could
lead to an inconsistent state of the view. Unfortunately, this
situation cannot be detected during the creation of the view.
As a result, create_immv
raises a warning
in these isolation levels, suggesting that the command be used
in READ COMMITTED
or that
refresh_immv
be executed afterward to make
the view’s contents remain consistent.
Note
While create_immv
is running, the
search_path
is temporarily changed to
pg_catalog, pg_temp
.
F.46.4.1.2. refresh_immv #
Use refresh_immv
function to refresh IMMV.
pgivm.refresh_immv(immv_name text, with_data bool) RETURNS bigint
refresh_immv
completely replaces the
contents of an IMMV as
REFRESH MATERIALIZED VIEW
command does for
a materialized view. To execute this function you must be the
owner of the IMMV (with PostgreSQL 16 or earlier) or have the
MAINTAIN
privilege on the IMMV (with
PostgreSQL 17 or later). The old contents are discarded.
The with_data flag is corresponding to
WITH [NO] DATA
option of REFRESH
MATERIALIZED VIEW` command. If with_data is true, the backing
query is executed to provide the new data, and if the IMMV is
unpopulated, triggers for maintaining the view are created.
Also, a unique index is created for IMMV if it is possible and
the view doesn’t have that yet. If with_data is false, no new
data is generated and the IMMV become unpopulated, and the
triggers are dropped from the IMMV. Note that unpopulated IMMV
is still scannable although the result is empty. This
behaviour may be changed in future to raise an error when an
unpopulated IMMV is scanned.
refresh_immv
acquires
AccessExclusiveLock
on the view. However,
even if we are able to acquire the lock, a concurrent
transaction may have already incrementally updated and
committed the view before we can acquire the lock. In
REPEATABLE READ
or
SERIALIZABLE
isolation level, this could
lead to an inconsistent state of the view. Therefore, an error
is raised to prevent anomalies when this situation is
detected.
Note
While refresh_immv
is running, the
search_path
is temporarily changed to
pg_catalog, pg_temp
.
F.46.4.1.3. get_immv_def #
get_immv_def
reconstructs the underlying
SELECT command for an IMMV. (This is a decompiled
reconstruction, not the original text of the command.)
pgivm.get_immv_def(immv regclass) RETURNS text
F.46.4.2. IMMV metadata catalog #
The catalog pgivm.pg_ivm_immv
stores IMMV
information.
Name | Type | Description |
---|---|---|
immvrelid | regclass | The OID of the IMMV |
viewdef | text | Query tree (in the form of a nodeToString() representation) for the view definition |
ispopulated | bool | True if IMMV is currently populated |
lastivmupdate | xid8 | The transaction ID of the most recent incremental update to the view |
F.46.5. Example #
F.46.5.1. CREATE MATERIALIZED VIEW
and
REFRESH MATERIALIZED VIEW
#
In general, IMMVs allow faster updates than
REFRESH MATERIALIZED VIEW
at the price of
slower updates to their base tables. Update of base tables is
slower because triggers will be invoked and the IMMV is updated
in triggers per modification statement.
For example, suppose a normal materialized view defined as below:
test=# CREATE MATERIALIZED VIEW mv_normal AS SELECT a.aid, b.bid, a.abalance, b.bbalance FROM pgbench_accounts a JOIN pgbench_branches b USING(bid); SELECT 10000000
Updating a tuple in a base table of this materialized view is
rapid but the REFRESH MATERIALIZED VIEW
command on this view takes a long time:
test=# UPDATE pgbench_accounts SET abalance = 1000 WHERE aid = 1; UPDATE 1 Time: 9.052 ms test=# REFRESH MATERIALIZED VIEW mv_normal ; REFRESH MATERIALIZED VIEW Time: 20575.721 ms (00:20.576)
F.46.5.2. Creating an IMMV #
On the other hand, after creating IMMV with the same view definition as below:
test=# SELECT pgivm.create_immv('immv', 'SELECT a.aid, b.bid, a.abalance, b.bbalance FROM pgbench_accounts a JOIN pgbench_branches b USING(bid)'); NOTICE: created index "immv_index" on immv "immv" create_immv ------------- 10000000 (1 row)
Updating a tuple in a base table takes more than the normal
view, but its content is updated automatically and this is
faster than the REFRESH MATERIALIZED VIEW
command.
test=# UPDATE pgbench_accounts SET abalance = 1234 WHERE aid = 1; UPDATE 1 Time: 15.448 ms test=# SELECT * FROM immv WHERE aid = 1; aid | bid | abalance | bbalance -----+-----+----------+---------- 1 | 1 | 1234 | 0 (1 row)
An appropriate index on IMMV is necessary for efficient IVM because we need to look for tuples to be updated in IMMV. If there are no indexes, it will take a long time.
Therefore, when an IMMV is created by the
create_immv
function, a unique index is
created on it automatically if possible. If the view definition
query has a GROUP BY clause, a unique index is created on the
columns of GROUP BY expressions. Also, if the view has DISTINCT
clause, a unique index is created on all columns in the target
list. Otherwise, if the IMMV contains all primary key attributes
of its base tables in the target list, a unique index is created
on these attributes. In other cases, no index is created.
In the previous example, a unique index “immv_index” is created on aid and bid columns of “immv”, and this enables the rapid update of the view. Dropping this index make updating the view take a loger time.
test=# DROP INDEX immv_index; DROP INDEX test=# UPDATE pgbench_accounts SET abalance = 9876 WHERE aid = 1; UPDATE 1 Time: 3224.741 ms (00:03.225)
F.46.5.3. IMMV with Aggregate Functions #
You can create an IMMV that includes aggregate functions.
test=# SELECT pgivm.create_immv('immv_agg', 'SELECT bid, count(*), sum(abalance), avg(abalance) FROM pgbench_accounts JOIN pgbench_branches USING(bid) GROUP BY bid'); NOTICE: created index "immv_agg_index" on immv "immv_agg" create_immv ------------- 100 (1 row) Time: 5772.625 ms (00:05.773)
Creating this view takes about five seconds, and the normal
refresh operation requires a similar amount of time. The
following example demonstrates refreshing the IMMV using
refresh_immv
. The execution time would be
approximately the same if you used
REFRESH MATERIALIZED VIEW
on a regular
materialized view with the same definition.
test=# SELECT pgivm.refresh_immv('immv_agg',true); refresh_immv -------------- 100 (1 row) Time: 5766.292 ms (00:05.766)
When a base table is updated, the IMMV is also automatically updated incrementally, as expected.
test=# SELECT bid, count, sum, avg FROM immv_agg WHERE bid = 42; bid | count | sum | avg -----+--------+-------+------------------------ 42 | 100000 | 38774 | 0.38774000000000000000 (1 row) Time: 3.123 ms test=# UPDATE pgbench_accounts SET abalance = abalance + 1000 WHERE aid = 4112345 AND bid = 42; UPDATE 1 Time: 16.616 ms test=# SELECT bid, count, sum, avg FROM immv_agg WHERE bid = 42; bid | count | sum | avg -----+--------+-------+------------------------ 42 | 100000 | 39774 | 0.39774000000000000000 (1 row) Time: 1.987 ms
After updating a row in the pgbench_accounts
table, you can see that the aggregated values in
immv_agg
are updated automatically.
F.46.5.4. Listing IMMVs and Viewing Their Definitions #
You can find all IMMVs in the pg_ivm_immv
catalog and view their definition queries by executing the
get_immv_def
function."
test=# SELECT immvrelid AS immv, pgivm.get_immv_def(immvrelid) AS immv_def FROM pgivm.pg_ivm_immv; immv | immv_def ----------+-------------------------------------------- immv_agg | SELECT pgbench_accounts.bid, + | count(*) AS count, + | sum(pgbench_accounts.abalance) AS sum,+ | avg(pgbench_accounts.abalance) AS avg + | FROM (pgbench_accounts + | JOIN pgbench_branches USING (bid)) + | GROUP BY pgbench_accounts.bid immv | SELECT a.aid, + | b.bid, + | a.abalance, + | b.bbalance + | FROM (pgbench_accounts a + | JOIN pgbench_branches b USING (bid)) (2 rows)
F.46.5.5. Dropping an IMMV #
An IMMV can be dropped using the DROP TABLE
command. Once an IMMV is dropped, its entry is automatically
removed from the pg_ivm_immv
catalog.
test=# DROP TABLE immv; DROP TABLE test=# SELECT immvrelid AS immv, pgivm.get_immv_def(immvrelid) AS immv_def FROM pgivm.pg_ivm_immv; immv | immv_def ----------+-------------------------------------------- immv_agg | SELECT pgbench_accounts.bid, + | count(*) AS count, + | sum(pgbench_accounts.abalance) AS sum,+ | avg(pgbench_accounts.abalance) AS avg + | FROM (pgbench_accounts + | JOIN pgbench_branches USING (bid)) + | GROUP BY pgbench_accounts.bid (1 row)
F.46.6. pg_dump
and
pg_upgrade
#
After restoring data from a pg_dump
backup or
upgrading PostgreSQL
using
pg_upgrade
, all IMMVs must be manually dropped
and recreated.
F.46.7. Supported View Definitions and Restriction #
Currently, IMMV’s view definition can contain inner joins,
DISTINCT clause, some built-in aggregate functions, simple
sub-queries in FROM
clause, EXISTS sub-queries,
and simple CTE (WITH
query). Inner joins
including self-join are supported, but outer joins are not
supported. Supported aggregate functions are count, sum, avg, min
and max. Other aggregates, sub-queries which contain an aggregate
or DISTINCT
clause, sub-queries in other than
FROM
clause, window functions,
HAVING
, ORDER BY
,
LIMIT
/OFFSET
,
UNION
/INTERSECT
/EXCEPT
,
DISTINCT ON
, TABLESAMPLE
,
VALUES
, and
FOR UPDATE
/SHARE
can not be
used in view definition.
The base tables must be simple tables. Views, materialized views, inheritance parent tables, partitioned tables, partitions, and foreign tables can not be used.
Any system column cannot be included in the view definition query.
The target list cannot columns whose name starts with
__ivm_
.
Data type used in the target list in the view must have default
operator class for access method btree. For example,
json
, xml
, or
point
type cannot be in the target list.
Logical replication is not supported, that is, even when a base table at a publisher node is modified, IMMVs at subscriber nodes defined on these base tables are not updated.
F.46.8. Notes #
F.46.8.1. Aggregates #
Supported aggregate functions are count
,
sum
, avg
,
min
, and max
. Currently,
only built-in aggregate functions are supported and user defined
aggregates cannot be used.
When an IMMV including aggregate is created, some extra columns
whose name start with __ivm
are automatically
added to the target list. __ivm_count__
contains the number of tuples aggregated in each group. In
addition, more than one extra columns for each column of
aggregated value are added in order to maintain the value. For
example, columns named like __ivm_count_avg__
and __ivm_sum_avg__
are added for maintaining
an average value. When a base table is modified, the new
aggregated values are incrementally calculated using the old
aggregated values and values of related extra columns stored in
the IMMV.
Note that for min
or max
,
the new values could be re-calculated from base tables with
regard to the affected groups when a tuple containing the
current minimal or maximal values are deleted from a base table.
Therefore, it can takes a long time to update an IMMV containing
these functions.
Also, note that using sum
or
avg
on real
(float4
) type or
double precision
(float8
)
type in IMMV is unsafe, because aggregated values in IMMV can
become different from results calculated from base tables due to
the limited precision of these types. To avoid this problem, use
the numeric
type instead.
F.46.8.1.1. Restrictions on Aggregate #
If we have a GROUP BY
clause, expressions
specified in GROUP BY
must appear in the
target list. This is how tuples to be updated in the IMMV are
identified. These attributes are used as scan keys for
searching tuples in the IMMV, so indexes on them are required
for efficient IVM.
Targetlist cannot contain expressions which contain an aggregate in it.
F.46.8.2. Subqueries #
Simple subqueries in FROM
clause and EXISTS
subqueries in ‘WHERE’ clause are supported.
F.46.8.2.1. Restrictions on Subqueries #
Subqueries using EXISTS and simple subqueries in FROM clause are supported. EXISTS subqueries with condition other than ‘AND’ and Subqueries in targetlist are not supported. EXISTS subquery is supported only in WHERE but not in the targetlist.
If EXISTS contains columns that refer to columns in tables in
the outer query, such columns must be included in the
targetlist. Subqueries containing an aggregate function or
DISTINCT
are not supported.
F.46.8.3. CTE #
Simple CTEs (WITH
queries) are supported.
F.46.8.3.1. Restrictions on CTEs #
WITH
queries containing an aggregate
function or DISTINCT
are not supported.
Recursive queries (WITH RECURSIVE
) are not
allowed. Unreferenced CTEs are not allowed either, that is, a
CTE must be referenced at least once in the view definition
query.
F.46.8.4. DISTINCT #
DISTINCT
is allowed in IMMV’s definition
queries. Suppose an IMMV defined with DISTINCT on a base table
containing duplicate tuples. When tuples are deleted from the
base table, a tuple in the view is deleted if and only if the
multiplicity of the tuple becomes zero. Moreover, when tuples
are inserted into the base table, a tuple is inserted into the
view only if the same tuple doesn’t already exist in it.
Physically, an IMMV defined with DISTINCT
contains tuples after eliminating duplicates, and the
multiplicity of each tuple is stored in an extra column named
__ivm_count__
that is added when such IMMV is
created.
F.46.8.5. TRUNCATE #
When a base table is truncated, the IMMV is also truncated and
the contents become empty if the view definition query does not
contain an aggregate without a GROUP BY
clause. Aggregate views without a GROUP BY
clause always have one row. Therefore, in such cases, if a base
table is truncated, the IMMV is simply refreshed instead of
being truncated.
F.46.8.6. Concurrent Transactions #
Incremental updates of a view are basically performed in sequentially even with concurrent transactions running.
Suppose an IMMV is defined on two base tables and each table is modified in different concurrent transactions simultaneously. In the transaction which was committed first, the IMMV can be updated considering only the change made in that transaction. However, to update the IMMV correctly in the transaction that commits later, we need to account for the changes made in both transactions.
For this reason, ExclusiveLock
is held on an
IMMV immediately after a base table is modified in
READ COMMITTED
isolation level. This ensures
that the IMMV is updated in the latter transaction only after
the former transaction has committed. In
REPEATABLE READ
or
SERIALIZABLE
isolation level, an error is
raised immediately if lock acquisition fails, as changes made by
other transactions are not visible in these levels, and the IMMV
cannot be updated correctly in such situations.
However, as an exception if the IMMV has only one base table,
does not use DISTINCT or GROUP BY, and is modified by
INSERT
, then the lock held on the IMMV is
RowExclusiveLock
.
Even if we are able to acquire a lock, a concurrent transaction
may have already incrementally updated and committed the view
before we can acquire the lock. In
REPEATABLE READ
or
SERIALIZABLE
isolation level, this could lead
to an inconsistent state of the view. Therefore, an error is
raised to prevent anomalies when this situation is detected.
F.46.8.7. Row Level Security #
If some base tables have row level security policy, rows that are not visible to the materialized view’s owner are excluded from the result. In addition, such rows are excluded as well when views are incrementally maintained. However, if a new policy is defined or policies are changed after the materialized view was created, the new policy will not be applied to the view contents. To apply the new policy, you need to recreate IMMV.
F.46.8.8. How to Disable or Enable Immediate Maintenance #
IVM is effective when we want to keep an IMMV up-to-date and small fraction of a base table is modified infrequently. Due to the overhead of immediate maintenance, IVM is not effective when a base table is modified frequently. Also, when a large part of a base table is modified or large data is inserted into a base table, IVM is not effective and the cost of maintenance can be larger than refresh from scratch.
In such situation, we can use refresh_immv
function with with_data = false
to disable
immediate maintenance before modifying a base table. After a
base table modification, call
refresh_immv
with
with_data = true
to refresh the view data and
enable immediate maintenance.
F.46.9. Developer #
IVM Development Group
Original Authors
https://github.com/yugo-n
https://github.com/thoshiai
F.46.10. License #
F.46.11. Copyright #
Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
Portions Copyright (c) 2022, IVM Development Group