pgcopydb#

pgcopydb

pgcopydb

pgcopydb — a tool that automates running pg_dump | pg_restore between two running Postgres servers.

pgcopydb

The pgcopydb project is an Open Source Software project. The development happens at https://github.com/dimitri/pgcopydb and is public: everyone is welcome to participate by opening issues, pull requests, giving feedback, etc.

Remember that the first steps are to actually play with the pgcopydb command, then read the entire available documentation (after all, I took the time to write it), and then to address the community in a kind and polite way — the same way you would expect people to use when addressing you.

How to copy a Postgres database

pgcopydb is a tool that automates copying a Tantor SE-1C database to another server. Main use case for pgcopydb is migration to a new Postgres system, either for new hardware, new architecture, or new Postgres major version.

The idea would be to run pg_dump -jN | pg_restore -jN between two running Postgres servers. To make a copy of a database to another server as quickly as possible, one would like to use the parallel options of pg_dump and still be able to stream the data to as many pg_restore jobs. Unfortunately, this approach cannot be implemented by using pg_dump and pg_restore directly, see Bypass intermediate files for the TABLE DATA.

When using pgcopydb it is possible to achieve both concurrency and streaming with this simple command line:

$ export PGCOPYDB_SOURCE_PGURI="postgres://[email protected]/dbname"
$ export PGCOPYDB_TARGET_PGURI="postgres://[email protected]/dbname"

$ pgcopydb clone --table-jobs 4 --index-jobs 4

See the manual page for pgcopydb clone for detailed information about how the command is implemented along with many other supported options.

Main pgcopydb features

Bypass intermediate files

When using pg_dump and pg_restore with the -jobs option, the table data is first copied to files on-disk before being read again and sent to the target server. pgcopydb avoids those steps and instead streams the COPY buffers from the source to the target with zero processing.

Use COPY FREEZE

Postgres has an optimization which reduces post-migration vacuum work by marking the imported rows as frozen already during the import, that's the FREEZE option to the VACUUM command. pgcopydb uses that option, unless when using same-table concurrency.

Create Index Concurrency

When creating an index on a table, Postgres has to implement a full sequential scan to read all the rows. Implemented in Postgres 8.3 is the synchronize_seqscans optimization where a single such on-disk read is able to feed several SQL commands running concurrently in different client sessions.

pgcopydb takes benefit of this feature by running many CREATE INDEX commands on the same table at the same time. This number is limited by the --index-jobs option.

Same Table Concurrency

When migrating a very large table, it might be beneficial to partition the table and run several COPY commands, distributing the source data using a non-overlapping WHERE clause. pgcopydb implements that approach with the split-table-larger-than option.

Change Data Capture

The simplest and safest way to migrate a database to a new Postgres server requires a maintenance window duration that's dependent on the size of the data to migrate.

Sometimes the migration context needs to reduce that downtime window. For these advanced and complex cases, pgcopydb embeds a full replication solution using the Postgres Logical Decoding low-level APIs, available since Postgres 9.4.

See the reference manual for the pgcopydb fork --follow command.

Introduction to pgcopydb

pgcopydb is a tool that automates copying a Tantor SE-1C database to another server. Main use case for pgcopydb is migration to a new Postgres system, either for new hardware, new architecture, or new Postgres major version.

The idea would be to run pg_dump -jN | pg_restore -jN between two running Postgres servers. To make a copy of a database to another server as quickly as possible, one would like to use the parallel options of pg_dump and still be able to stream the data to as many pg_restore jobs. Unfortunately, this approach cannot be implemented by using pg_dump and pg_restore directly, see Bypass intermediate files for the TABLE DATA.

When using pgcopydb it is possible to achieve both concurrency and streaming with this simple command line:

$ export PGCOPYDB_SOURCE_PGURI="postgres://[email protected]/dbname"
$ export PGCOPYDB_TARGET_PGURI="postgres://[email protected]/dbname"

$ pgcopydb clone --table-jobs 4 --index-jobs 4

See the manual page for pgcopydb clone for detailed information about how the command is implemented along with many other supported options.

Feature Matrix

Here is a comparison of the features available when using pg_dump and pg_restore directly versus when using pgcopydb to handle the database copying:

Feature pgcopydb pg_dump ; pg_restore
Single-command operation
Snapshot consistency
Ability to resume partial run
Advanced filtering
Tables concurrency
Same-table concurrency
Index concurrency
Constraint index concurrency
Schema
Large Objects
Vacuum Analyze
Copy Freeze
Roles ✗ (needs pg_dumpall)
Tablespaces ✗ (needs pg_dumpall)
Follow changes

Refer to the documentation about pgcopydb configuration for its Advanced filtering capabilities.

pgcopydb uses pg_dump and pg_restore

The implementation of pgcopydb actually calls into the pg_dump and pg_restore binaries to handle a large part of the work, such as the pre-data and post-data sections. Refer to pg_dump docs for more information about the three sections supported.

After using pg_dump to obtain the pre-data and the post-data parts, then pgcopydb restores the pre-data parts to the target Postgres instance using pg_restore.

pgcopydb then uses SQL commands and the COPY streaming protocol to migrate the table contents, the large objects data, and to VACUUM ANALYZE tables as soon as the data becomes available on the target instance.

Then pgcopydb uses SQL commands to build the indexes on the target Postgres instance, as detailed in the design doc For each table, build all indexes concurrently. This allows to include constraint indexes such as Primary Keys in the list of indexes built at the same time.

Change Data Capture, or fork and follow

It is also possible with pgcopydb to implement Change Data Capture and replay data modifications happening on the source database to the target database. See the pgcopydb follow command and the pgcopydb clone --follow command line option at pgcopydb clone in the manual.

The simplest possible implementation of online migration with pgcopydb, where changes being made to the source Postgres instance database are replayed on the target system, looks like the following:

$ pgcopydb clone --follow &

# later when the application is ready to make the switch
$ pgcopydb stream sentinel set endpos --current

# later when the migration is finished, clean-up both source and target
$ pgcopydb stream cleanup

Tutorial

This documentation section for pgcopydb contains a list of classic pgcopydb use-cases. For details about the commands and their options see the manual page for each command at pgcopydb.

Copy Postgres Database to a new server

The simplest way to use pgcopydb is to just use the pgcopydb clone command as in the following example.

$ export PGCOPYDB_SOURCE_PGURI="dbname=pagila"
$ export PGCOPYDB_TARGET_PGURI="postgres://user@target:5432/pagila"

$ pgcopydb clone

Note that the options --source and --target can also be used to set the Postgres connection strings to the databases; however, using environment variables is particulary useful when using Docker containers.

You might also notice here that both the source and target Postgres databases must already exist for pgcopydb to operate.

Copy Postgres users and extensions

To copy Postgres users, a privileged connection to the target database must be setup, and to include passwords, a privileged connection to the source database must be setup as well. If it is required to limit these privileged connections to a minimum, then the following approach may be used:

$ coproc ( pgcopydb snapshot --source ... )

# first two commands would use a superuser role
$ pgcopydb copy roles --source ... --target ...
$ pgcopydb copy extensions --source ... --target ...

# now it's possible to use a non-superuser role
$ pgcopydb clone --skip-extensions --source ... --target ...

$ kill -TERM ${COPROC_PID}
$ wait ${COPROC_PID}

How to edit the schema when copying a database?

It is possible to split pgcopydb operations and to run them one at a time.

However, please note that in these cases, concurrency and performance characteristics that depend on concurrency are then going to be pretty limited compared to the main pgcopydb clone command where different sections are running concurrently with one-another.

Still in some cases, running operations with more control over different steps can be necessary. An interesting such use-case consists of injecting schema changes before copying the data over:

#
# pgcopydb uses the environment variables
#
$ export PGCOPYDB_SOURCE_PGURI=...
$ export PGCOPYDB_TARGET_PGURI=...

#
# we need to export a snapshot, and keep it while the indivual steps are
# running, one at a time
#
$ coproc ( pgcopydb snapshot )

$ pgcopydb dump schema --resume
$ pgcopydb restore pre-data --resume

#
# Here you can implement your own SQL commands on the target database.
#
$ psql -d ${PGCOPYDB_TARGET_PGURI} -f schema-changes.sql

# Now get back to copying the table-data, indexes, constraints, sequences
$ pgcopydb copy data --resume   
$ pgcopydb restore post-data --resume

$ kill -TERM ${COPROC_PID}
$ wait ${COPROC_PID}

$ pgcopydb list progress --summary

Note that to ensure consistency of operations, the pgcopydb snapshot command has been used. See Resuming Operations (snaphots) for details.

Follow mode, or Change Data Capture

When implementing Change Data Capture then more sync points are needed between pgcopydb and the application in order to implement a clean cutover.

Start with the initial copy and the replication setup:

$ export PGCOPYDB_SOURCE_PGURI="dbname=pagila"
$ export PGCOPYDB_TARGET_PGURI="postgres://user@target:5432/pagila"

$ pgcopydb clone --follow

While the command is running, check the replication progress made by pgcopydb with the Postgres pg_stat_replication view.

When the lag is close enough for your maintenance window specifications, then it's time to disconnect applications from the source database, finish the migration off, and re-connect your applications to the target database:

$ pgcopydb stream sentinel set endpos --current

This command must be run within the same --dir as the main pgcopydb clone --follow command, in order to share the same internal catalogs with the running processes.

When the migration is completed, cleanup the resources created for the Change Data Capture with the following command:

$ pgcopydb stream cleanup

See also Change Data Capture using Postgres Logical Decoding for mode details and other modes of operations.

How to validate schema and data migration?

The command pgcopydb compare schema is currently limited to comparing the metadata that pgcopydb grabs about the Postgres schema. This applies to comparing the list of tables, their attributes, their indexes and constraints, and the sequences values.

The command pgcopydb compare data runs an SQL query that computes a checksum of the data on each Postgres instance (i.e. source and target) for each table, and then only compares the checksums. This is not a full comparison of the data set, and it shall produce a false positive for cases where the checksums are the same but the data is different.

$ pgcopydb compare schema
$ pgcopydb compare data

Installing pgcopydb

Several distributions are available for pgcopydb.

debian packages

Binary packages for debian and derivatives (ubuntu) are available from apt.postgresql.org repository, install by following the linked documentation and then:

$ sudo apt-get install pgcopydb

RPM packages

The Postgres community repository for RPM packages is yum.postgresql.org and includes binary packages for pgcopydb. The way packages are built for RPM based systems means that the user needs to choose which version of Postgres pgcopydb was built with. In practice, this doesn't have much importance, because libpq is meant to be compatible with many different Postgres server versions.

After following the instructions for installing the repository, in this example in a Docker image for Rocky Linux (docker run --rm -it rockylinux:9), then we get the following:

# dnf search --all --quiet pgcopydb
======================== Name & Description & URL Matched: pgcopydb ========================
pgcopydb.x86_64 : Automate pg_dump | pg_restore between two running Postgres servers
pgcopydb_11.x86_64 : Automate pg_dump | pg_restore between two running Postgres servers
pgcopydb_12.x86_64 : Automate pg_dump | pg_restore between two running Postgres servers
pgcopydb_13.x86_64 : Automate pg_dump | pg_restore between two running Postgres servers
pgcopydb_14.x86_64 : Automate pg_dump | pg_restore between two running Postgres servers
pgcopydb_15.x86_64 : Automate pg_dump | pg_restore between two running Postgres servers

Docker Images

Docker images are maintained for each tagged release at dockerhub, and also built from the CI/CD integration on GitHub at each commit to the main branch.

The DockerHub dimitri/pgcopydb repository is where the tagged releases are made available. The image uses the Postgres version currently in debian stable.

To use this docker image:

$ docker run --rm -it dimitri/pgcopydb:v0.17 pgcopydb --version

Or you can use the CI/CD integration that publishes packages from the main branch to the GitHub docker repository:

$ docker pull ghcr.io/dimitri/pgcopydb:latest
$ docker run --rm -it ghcr.io/dimitri/pgcopydb:latest pgcopydb --fversion
$ docker run --rm -it ghcr.io/dimitri/pgcopydb:latest pgcopydb --help

Build from sources

Building from source requires a list of build-dependencies that's comparable to that of Postgres itself. The pgcopydb source code is written in C and the build process uses a GNU Makefile.

See our main Dockerfile for a complete recipe to build pgcopydb as a debian package when using a debian environment.

In particular, the following build dependencies are required to build pgcopydb. The list is long, because pgcopydb requires a lot of the same packages as Postgres itself:

$ apt-get install -y --no-install-recommends \
    build-essential \
    autotools-dev \
    libedit-dev \
    libgc-dev \
    libpam0g-dev \
    libreadline-dev \
    libselinux1-dev \
    libxslt1-dev \
    libssl-dev \
    libkrb5-dev \
    zlib1g-dev \
    liblz4-dev \
    libpq5 \
    libpq-dev \
    libzstd-dev \
    postgresql-server-dev-all \
    postgresql-common \
    postgresql \
    python3-sphinx

Then the build process is pretty simple, in its simplest form you can just use make clean install. If you want to be more fancy, you can also consider:

$ make -s clean
$ make -s -j12 install

Once you made it this far, it is a good idea to check our Contribution Guide.

Features Highlights

pgcopydb project was started to allow certain improvements and considerations which were otherwise not possible to achieve directly with pg_dump and pg_restore commands. Below are the details of what pgcopydb can achieve.

Bypass intermediate files for the TABLE DATA

First aspect is that for pg_dump and pg_restore to implement concurrency, they need to write to an intermediate file first.

The docs for pg_dump say the following about the --jobs parameter:

From the Tantor SE-1C documentation

You can only use this option with the directory output format because this is the only output format where multiple processes can write their data at the same time.

The docs for pg_restore say the following about the --jobs parameter:

From the Tantor SE-1C documentation

Only the custom and directory archive formats are supported with this option. The input must be a regular file or directory (not, for example, a pipe or standard input).

So the first idea with pgcopydb is to provide the --jobs concurrency and bypass intermediate files (and directories) altogether, at least as far as the actual TABLE DATA set is concerned.

The trick to achieve that is that pgcopydb must be able to connect to the source database during the whole operation, whereas pg_restore may be used from an export on-disk, without having to still be able to connect to the source database. In the context of pgcopydb, requiring access to the source database is fine. In the context of pg_restore, it would not be acceptable.

Large-Objects Support

The Postgres Large-Objects API is nobody's favorite, though the trade-offs implemented in that API are found to be very useful by many application developers. In the context dump and restore, Postgres separates the large objects metadata from the large object contents.

Specifically, the metadata consists of a large-object OID and ACLs, and is considered to be part of the pre-data section of a Postgres dump.

This means that pgcopydb relies on pg_dump to import the large object metadata from the source to the target Postgres server, but then implements its own logic to migrate the large objects contents, using several worker processes depending on the setting of the command-line option --large-objects-jobs.

Concurrency

A major feature of pgcopydb is how concurrency is implemented, including options to obtain same-table COPY concurrency. See the Notes about concurrency chapter of the documentation for more information.

Change Data Capture

pgcopydb implements full Postgres replication solution based on the lower-level API for Postgres Logical Decoding. This allows pgcopydb to be compatible with old versions of Postgres, starting with version 9.4.

Always do a test migration first without the --follow option to have an idea of the downtime window needed for your very own case. This will inform your decision about using the Change Data Capture mode, which makes a migration a lot more complex to drive to success.

Tantor SE-1C Logical Decoding Client

The replication client of pgcopydb has been designed to be able to fetch changes from the source Postgres instance concurrently to the initial COPY of the data. Three worker processes are created to handle the logical decoding client:

  • The streaming process fetches data from the Postgres replication slot using the Postgres replication protocol.

  • The transform process transforms the data fetched from an intermediate JSON format into a derivative of the SQL language. In prefetch mode this is implemented as a batch operation; in replay mode this is done in a streaming fashion, one line at a time, reading from a unix pipe.

  • The apply process then applies the SQL script to the target Postgres database system and uses Postgres APIs for Replication Progress Tracking.

During the initial COPY phase of operations, pgcopydb follow runs in prefetch mode and does not apply changes yet. After the initial COPY is done, then pgcopy replication system enters a loop that switches between the following two modes of operation:

  1. In prefetch mode, changes are stored to JSON files on-disk, the transform process operates on files when a SWITCH occurs, and the apply process catches-up with changes on-disk by applying one file at time.

    When the next file to apply does not exists (yet), then the 3 transform worker processes stop and the main follow supervisor process then switches to replay mode.

  2. In replay mode changes are streamed from the streaming worker process to the transform worker process using a Unix PIPE mechanism, and the obtained SQL statements are sent to the replay worker process using another Unix PIPE.

    Changes are then replayed in a streaming fashion, end-to-end, with a transaction granularity.

The internal SQL-like script format

The Postgres Logical Decoding API does not provide a CDC format, instead it allows Postgres extension developers to implement logical decoding output plugins. The Postgres core distribution implements such an output plugin named test_decoding. Another commonly used output plugin is named wal2json.

pgcopydb is compatible with both test_decoding and wal2json plugins. As a user it's possible to choose an output plugin with the --plugin command-line option.

The output plugin compatibility means that pgcopydb has to implement code to parse the output plugin syntax and make sense of it. Internally, the messages from the output plugin are stored by pgcopydb in a JSON Lines formatted file, where each line is a JSON record with decoded metadata about the changes and the output plugin message, as-is.

This JSON Lines format is transformed into SQL scripts. At first, pgcopydb would just use SQL for the intermediate format, but then support for prepared statements was added as an optimization. This means that our SQL script uses commands such as the following examples:

PREPARE d33a643f AS INSERT INTO public.rental ("rental_id", "rental_date", "inventory_id", "customer_id", "return_date", "staff_id", "last_update") overriding system value VALUES ($1, $2, $3, $4, $5, $6, $7), ($8, $9, $10, $11, $12, $13, $14);
EXECUTE d33a643f["16050","2022-06-01 00:00:00+00","371","291",null,"1","2022-06-01 00:00:00+00","16051","2022-06-01 00:00:00+00","373","293",null,"2","2022-06-01 00:00:00+00"];

As you can see in the example, pgcopydb is now able to use a single INSERT statement with multiple VALUES, which is a huge performance boost. In order to simplify pgcopydb parsing of the SQL syntax, the choice was made to format the EXECUTE argument list as a JSON array, which does not comply with the actual SQL syntax, but is simple and fast to process.

Finally, it's not possible for the transform process to anticipate the actual session management of the apply process, so SQL statements are always included with both the PREPARE and the EXECUTE steps. The pgcopydb apply code knows how to skip PREPARing again, of course.

Unfortunately that means that our SQL files are not actually using SQL syntax and can't be processed as-is with any SQL client software. At the moment either using pgcopydb stream apply or writing your own processing code is required.

Internal Catalogs (SQLite)

To be able to implement pgcopydb operations, a list of SQL objects such as tables, indexes, constraints and sequences is needed internally. While pgcopydb used to handle such a list as an array in-memory, with also a hash-table for direct lookup (by oid and by restore list name), in some cases the source database contains so many objects that these arrays do not fit in memory.

As pgcopydb is written in C, the current best approach to handle an array of objects that needs to spill to disk and supports direct lookup is actually the SQLite library, file format, and embedded database engine.

That's why the current version of pgcopydb uses SQLite to handle its catalogs.

Internally pgcopydb stores metadata information in three different catalogs, all found in the ${TMPDIR}/pgcopydb/schema/ directory by default, unless using the recommended --dir option.

  • The source catalog registers metadata about the source database, and also some metadata about the pgcopydb context, consistency, and progress.

  • The filters catalog is only used when the --filters option is provided, and it registers metadata about the objects in the source database that are going to be skipped.

    This is necessary because the filtering is implemented using the pg_restore --list and pg_restore --use-list options. The Postgres archive Table Of Contents format contains an object OID and its restore list name, and pgcopydb needs to be able to lookup for that OID or name in its filtering catalogs.

  • The target catalog registers metadata about the target database, such as the list of roles, the list of schemas, or the list of already existing constraints found on the target database.

Concurrency

The reason why pgcopydb has been developed is mostly to allow two aspects that are not possible to achieve directly with pg_dump and pg_restore, and that requires just enough fiddling around that not many scripts have been made available to automate around.

Notes about concurrency

The pgcopydb too implements many operations concurrently to one another, by ways of using the fork() system call. This means that pgcopydb creates sub-processes that each handle a part of the work.

The process tree then looks like the following:

$ pgcopydb clone --follow --table-jobs 4 --index-jobs 4 --large-objects-jobs 4
 + pgcopydb clone worker
    + pgcopydb copy supervisor [ --table-jobs 4 ]
      - pgcopydb copy queue worker
      - pgcopydb copy worker
      - pgcopydb copy worker
      - pgcopydb copy worker
      - pgcopydb copy worker

    + pgcopydb blob metadata worker [ --large-objects-jobs 4 ]
      - pgcopydb blob data worker
      - pgcopydb blob data worker
      - pgcopydb blob data worker
      - pgcopydb blob data worker

    + pgcopydb index supervisor [ --index-jobs 4 ]
      - pgcopydb index/constraints worker
      - pgcopydb index/constraints worker
      - pgcopydb index/constraints worker
      - pgcopydb index/constraints worker

    + pgcopydb vacuum supervisor [ --table-jobs 4 ]
      - pgcopydb vacuum analyze worker
      - pgcopydb vacuum analyze worker
      - pgcopydb vacuum analyze worker
      - pgcopydb vacuum analyze worker

    + pgcopydb sequences reset worker

 + pgcopydb follow worker [ --follow ]
   - pgcopydb stream receive
   - pgcopydb stream transform
   - pgcopydb stream catchup

Observe that when using pgcopydb clone --follow --table-jobs 4 --index-jobs 4 --large-objects-jobs 4, pgcopydb creates 27 sub-processes.

The 27 total is counted from:

  • 1 clone worker + 1 copy supervisor + 1 copy queue worker + 4 copy workers + 1 blob metadata worker + 4 blob data workers + 1 index supervisor + 4 index workers + 1 vacuum supervisor + 4 vacuum workers + 1 sequence reset worker

    that's 1 + 1 + 1 + 4 + 1 + 4 + 1 + 4 + 1 + 4 + 1 = 23

  • 1 follow worker + 1 stream receive + 1 stream transform + 1 stream catchup

    that's 1 + 1 + 1 + 1 = 4

  • At the end, it is 23 + 4 = 27 total

Here is a description of the process tree:

  • When starting with the TABLE DATA copying step, then pgcopydb creates as many sub-processes as specified by the --table-jobs command line option (or the environment variable PGCOPYDB_TABLE_JOBS), and an extra process is created to send the table to the queue and to handle TRUNCATE commands for COPY-partitioned tables.

  • A single sub-process is created by pgcopydb to copy the Postgres Large Objects (BLOBs) metadata found on the source database to the target database, and as many as --large-objects-jobs processes are started to copy the large object data.

  • To drive the index and constraint build on the target database, pgcopydb creates as many sub-processes as specified by the --index-jobs command line option (or the environment variable PGCOPYDB_INDEX_JOBS).

    It is possible with Postgres to create several indexes for the same table in parallel, for that, the client just needs to open a separate database connection for each index and run each CREATE INDEX command in its own connection, at the same time. In pgcopydb, this is implemented by running one sub-process per index to create.

    The --index-jobs option is global for the whole operation, so that it's easier to setup to the count of available CPU cores on the target Postgres instance. Usually, a given CREATE INDEX command uses 100% of a single core.

  • To drive the VACUUM ANALYZE workload on the target database, pgcopydb creates as many sub-processes as specified by the --table-jobs command line option.

  • To reset sequences in parallel to COPYing the table data, pgcopydb creates a single dedicated sub-process.

  • When using the --follow option then another sub-process leader is created to handle the three Change Data Capture processes.

    • One process implements pgcopydb stream receive to fetch changes in the JSON format and to pre-fetch them in JSON files.

    • As soon as JSON file is completed, the pgcopydb stream transform worker transforms the JSON file into SQL, as if by calling the command pgcopydb stream transform.

    • Another process implements pgcopydb stream catchup to apply SQL changes to the target Postgres instance. This process loops over querying the pgcopydb sentinel table until the apply mode has been enabled, and then loops over the SQL files and run the queries from them.

For each table, build all indexes concurrently

pgcopydb takes the extra step and makes sure to create all your indexes in parallel to one-another, going the extra mile when it comes to indexes that are associated with a constraint.

Postgres introduced the configuration parameter synchronize_seqscans in version 8.3, eons ago. It is on by default and allows the following behavior:

From the Tantor SE-1C documentation

This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload.

The other aspect that pg_dump and pg_restore are not very smart about is how they deal with the indexes that are used to support constraints, in particular unique constraints and primary keys.

Those indexes are exported using the ALTER TABLE command directly. This is fine because the command creates both the constraint and the underlying index, so the schema in the end is constructed as expected.

That said, those ALTER TABLE ... ADD CONSTRAINT commands require a level of locking that prevents any concurrency. As we can read on the docs for ALTER TABLE:

From the Tantor SE-1C documentation

Although most forms of ADD table_constraint require an ACCESS EXCLUSIVE lock, ADD FOREIGN KEY requires only a SHARE ROW EXCLUSIVE lock. Note that ADD FOREIGN KEY also acquires a SHARE ROW EXCLUSIVE lock on the referenced table, in addition to the lock on the table on which the constraint is declared.

The trick is then to first issue a CREATE UNIQUE INDEX statement and when the index has been built then issue a second command in the form of ALTER TABLE ... ADD CONSTRAINT ... PRIMARY KEY USING INDEX ..., as in the following example which is taken from the logs of an actual pgcopydb run:

21:52:06 68898 INFO  COPY "demo"."tracking";
21:52:06 68899 INFO  COPY "demo"."client";
21:52:06 68899 INFO  Creating 2 indexes for table "demo"."client"
21:52:06 68906 INFO  CREATE UNIQUE INDEX client_pkey ON demo.client USING btree (client);
21:52:06 68907 INFO  CREATE UNIQUE INDEX client_pid_key ON demo.client USING btree (pid);
21:52:06 68898 INFO  Creating 1 indexes for table "demo"."tracking"
21:52:06 68908 INFO  CREATE UNIQUE INDEX tracking_pkey ON demo.tracking USING btree (client, ts);
21:52:06 68907 INFO  ALTER TABLE "demo"."client" ADD CONSTRAINT "client_pid_key" UNIQUE USING INDEX "client_pid_key";
21:52:06 68906 INFO  ALTER TABLE "demo"."client" ADD CONSTRAINT "client_pkey" PRIMARY KEY USING INDEX "client_pkey";
21:52:06 68908 INFO  ALTER TABLE "demo"."tracking" ADD CONSTRAINT "tracking_pkey" PRIMARY KEY USING INDEX "tracking_pkey";

This trick is worth a lot of performance gains on its own, as has been discovered and experienced and appreciated by pgloader users already.

Same-table Concurrency

For some databases, it just so happens that most of the database size on-disk is contained within a single giant table, or within a short list of giant tables. When this happens, the concurrency model that is implemented with --table-jobs still allocates a single process to COPY all the data from the source table.

Same-table concurrency allows pgcopydb to use more than one process at the same time to process a single source table. The data is then logically partitioned (on the fly) and split between processes:

  • To fetch the data from the source database, the COPY processes then use SELECT queries like in the following example:

    COPY (SELECT * FROM source.table WHERE id BETWEEN      1 AND 123456)
    COPY (SELECT * FROM source.table WHERE id BETWEEN 123457 AND 234567)
    COPY (SELECT * FROM source.table WHERE id BETWEEN 234568 AND 345678)
    ...
    

    This is only possible when the source.table has at least one column of an integer type (int2, int4, and int8 are supported) and with a UNIQUE or PRIMARY KEY constraint. We must make sure that any given row is selected only once overall to avoid introducing duplicates on the target database.

    When a table is missing such a primary key column of an integer data type, pgcopydb then automatically resorts to using CTID based comparisons. See Postgres documentation section about System Columns for more information about Postgres CTIDs.

    The COPY processes then use the SELECT queries like in the following example:

    COPY (SELECT * FROM source.table WHERE ctid >= '(0,0)'::tid and ctid < '(5925,0)'::tid)
    COPY (SELECT * FROM source.table WHERE ctid >= '(5925,0)'::tid and ctid < '(11850,0)'::tid)
    COPY (SELECT * FROM source.table WHERE ctid >= '(11850,0)'::tid and ctid < '(17775,0)'::tid)
    COPY (SELECT * FROM source.table WHERE ctid >= '(17775,0)'::tid and ctid < '(23698,0)'::tid)
    COPY (SELECT * FROM source.table WHERE ctid >= '(23698,0)'::tid)
    
  • To decide if a table COPY processing should be split, the command line option split-tables-larger-than is used, or the environment variable PGCOPYDB_SPLIT_TABLES_LARGER_THAN.

    The expected value is either a plain number of bytes, or a pretty-printed number of bytes such as 250 GB.

    When using this option, then tables that have at least this amount of data and also a candidate key for the COPY partitioning are then distributed among a number of COPY processes.

    The number of COPY processes is computed by dividing the table size by the threshold set with the split option. For example, if the threshold is 250 GB then a 400 GB table is going to be distributed among 2 COPY processes.

    The command pgcopydb list table-part may be used to list the COPY partitioning that pgcopydb computes given a source table and a threshold.

Significant differences when using same-table COPY concurrency

When same-table concurrency happens for a source table, some operations are not implemented as they would have been without same-table concurrency. Specifically:

  • TRUNCATE and COPY FREEZE Postgres optimization

    When using a single COPY process, it's then possible to TRUNCATE the target table in the same transaction as the COPY command, as in the following synthetic example:

    BEGIN;
    TRUNCATE table ONLY;
    COPY table FROM stdin WITH (FREEZE);
    COMMIT
    

    This technique allows Postgres to implement several optimizations, doing work during the COPY that would otherwise need to happen later when executing the first queries on the table.

    When using same-table concurrency then we have several transactions happening concurrently on the target system that are copying data from the source table. This means that we have to TRUNCATE separately and the FREEZE option can not be used.

  • CREATE INDEX and VACUUM

    Even when same-table COPY concurrency is enabled, creating the indexes on the target system only happens after the whole data set has been copied over. This means that only when the last process is done with the COPYing, then this process will take care of the indexes and the vacuum analyze operation.

Same-table COPY concurrency performance limitations

Finally, it might be that same-table concurrency is not effective at all in some use cases. Here is a list of limitations to have in mind when selecting to use this feature:

  • Network Bandwidth

    The most common performance bottleneck relevant to database migrations is the network bandwidth. When the bandwidth is saturated (used in full) then same-table concurrency will provide no performance benefits.

  • Disks IOPS

    The second most common performance bottleneck relevant to database migrations is disks IOPS and, in the Cloud, burst capacity. When the disk bandwidth is used in full, then same-table concurrency will provide no performance benefits.

    Source database system uses read IOPS, target database system uses both read and write IOPS (copying the data writes to disk, creating the indexes both read table data from disk and then write index data to disk).

  • On-disk data organisation

    When using a single COPY process, the target system may fill-in the Postgres table in a clustered way, using each disk page in full before opening the next one, in a sequential fashion.

    When using same-table COPY concurrency, then the target Postgres system needs to handle concurrent writes to the same table, resulting in a possibly less effective disk usage.

    How that may impact your application performance is to be tested.

  • synchronize_seqscans

    Postgres implemented this option back in version 8.3. The option is now documented in the Version and Platform Compatibility section.

    The documentation reads: "This allows sequential scans of large tables to synchronize with each other, so that concurrent scans read the same block at about the same time and hence share the I/O workload."

    The impact on performance when having concurrent COPY processes reading the same source table at the same time is to be assessed. At the moment there is no option in pgcopydb to SET synchronize_seqscans TO off when using same-table COPY concurrency.

    Use your usual Postgres configuration editing for testing.

Resuming Operations (snaphots)

An important aspect of pgcopydb design is detailed in the documentation section Notes about concurrency and has to do with using many concurrent worker processes to implement parallelism.

Even when using multiple worker processes, it is important that pgcopydb operations are consistent. It is essential to guarantee that the same source schema and data set are used by every single worker process throughout the operations.

Consistency with multiple Postgres sessions is achieved thanks to Postgres' ability to export and import snapshots. As per Postgres docs about Snapshot Synchronization Functions:

From the Tantor SE-1C documentation

Tantor SE-1C allows database sessions to synchronize their snapshots. A snapshot determines which data is visible to the transaction that is using the snapshot. Synchronized snapshots are necessary when two or more sessions need to see identical content in the database. If two sessions just start their transactions independently, there is always a possibility that some third transaction commits between the executions of the two START TRANSACTION commands, so that one session sees the effects of that transaction and the other does not.

To solve this problem, Tantor SE-1C allows a transaction to export the snapshot it is using. As long as the exporting transaction remains open, other transactions can import its snapshot, and thereby be guaranteed that they see exactly the same view of the database that the first transaction sees. But note that any database changes made by any one of these transactions remain invisible to the other transactions, as is usual for changes made by uncommitted transactions. So the transactions are synchronized with respect to pre-existing data, but act normally for changes they make themselves.

Snapshots are exported with the pg_export_snapshot function, shown in Table 9.94, and imported with the SET TRANSACTION command.

Using these Postgres APIs allows pgcopydb to implement consistent operations even when using multiple worker processes.

Bypassing consistency issues

If you can ensure that no writes happen on the source database for the whole duration of the pgcopydb operations, which means no schema change (DDL) and no data change (DML), then consistency issues can't happen: that's because the database is static for our context, probably within a maintenance window setup where the applications are disconnected from the source database service.

Note that pgcopydb offers the --not-consistent option that allows bypassing all the complexity of sharing a snapshot throughout the operations. In particular, resuming operations after a crash or even implementing multi-steps operations is made easier when bypassing consistency aspects altogether.

When you are able to work within a maintenance window where the database is isolated from any application traffic, consider using --not-consistent.

Consistency and concurrency: Postgres snapshots

As seen above, Postgres offers different APIs to export and import a snapshot:

  1. Function pg_export_snapshot() exports the current snapshot.

  2. SQL command SET TRANSACTION SNAPSHOT imports the given snapshot.

  3. Replication protocol command CREATE_REPLICATION_SLOT allows exporting its snapshot.

Exporting a Postgres snapshot can be done either at the create replication slot time, or from a non-replication connection using the SQL function pg_export_snapshot(). This is an either/or situation, Postgres does not allow mixing these two approaches.

Also remember that a single snapshot must be used throughout pgcopydb operations, both the initial COPY of the schema and data and also the Change Data Capture aspects in order to achieve consistency (no data loss, no duplicates in the data change stream).

To be able to implement multiple worker processes in pgcopydb and have a consistent view of the same database (schema, data) in every single process, pgcopydb needs to first export a single common snapshot and then have every worker process import that snapshot when connecting to the source database.

When implementing Change-Data-Capture thanks to the --follow option, then it is also required that there is no gap between the initial snapshot and the first change received, and also that no changes are sent that were part of the initial copy. Postgres knows how to offer that guarantee via its snapshot exporting facility in the CREATE_REPLICATION_SLOT replication command.

As a result, the pgcopydb snapshot --follow command is required by the Postgres API to also create the pgcopydb replication slot, and exports the replication slot snapshot.

The pgcopydb snapshot command, when the --follow option is not used, simply connects to the Postgres source database using the normal query protocol and run the command select pg_export_snapshot() to grab a snapshot that can be shared by all the worker processes.

Moreover the Postgres API for exporting a snapshot has the following limitation:

From the Tantor SE-1C documentation

The snapshot is available for import only until the end of the transaction that exported it.

This means that the pgcopydb snapshot command must be kept running for the whole pgcopydb initial copy operations. The replication client only uses the replication slot to ensure consistency, so when only the follow worker processes are running, holding on to the snapshot is not required anymore.

Resumability of pgcopydb operations

The ability to resume operations when using pgcopydb faces three quite different contexts. Depending on the context and when the previous operation has been interrupted, then running the same pgcopydb command again with the --resume option might just work, or might error out because the situation does not allow a consistent resuming of the operation that was interrupted.

Bypassing consistency issues

When the --resume --not-consistent options are used, then there is no restrictions around snapshot re-use when trying to resume interrupted operations.

Consistent copy of the data

When using pgcopydb clone --resume the snapshot used in the previous attempts is going to be re-used. For Postgres to be able to import that snapshot again, the transaction that exported the snapshot must still be running on the source database system.

Single pgcopydb command

When using pgcopydb clone the snapshot holding process is part of that single process tree, and any interruption of this command (signal, C-c, crash) also terminates the snapshot holding sub-process and the snapshot is then lost.

Separate pgcopydb snapshot command

That's why the pgcopydb snapshot command can be used separately. Then the main pgcopydb clone command re-uses the snapshot automatically and can be left holding the snapshot even in case of an interruption of the pgcopydb clone command.

External snapshot

It is also possible to use another command or software to export and maintain the snapshot that pgcopydb uses and then use pgcopydb clone --snapshot ... to share the snapshot with pgcopydb.

Consistent copy of the data with CDC

When using Change Data Capture with the --follow option resuming operations consistently requires the following situation:

  1. The initial COPY of the data needs to still have access to the exported snapshot.

    Even when the snapshot has been exported with a replication protocol command, Postgres still requires the session to be maintained opened here.

  2. The logical replication on the client side is not concerned with the snapshot operations, that's done server-side when creating the replication slot; from there on, all the client has to do is consume from the replication slot.

Snapshot and catalogs (cache invalidation)

The source catalog table setup registers information about the current pgcopydb command. The information is checked at start-up in order to avoid re-using data in a different context.

The information registered is the following, and also contains the snapshot information. In case of a mismatch, consider using --resume --not-consistent when that's relevant to your operations.

Here's how to inspect the current setup information that pgcopydb maintains in its local catalog cache:

$ sqlite3 /tmp/pgcopydb/schema/source.db
sqlite> .mode line
sqlite> select * from setup;
                      id = 1
           source_pg_uri = postgres:///pagila
           target_pg_uri = postgres:///plop
                snapshot = 00000003-00000048-1
split_tables_larger_than = 0
                 filters = {"type":"SOURCE_FILTER_TYPE_NONE"}
                  plugin =
               slot_name =

The source and target connection strings only contain the Postgres servers hostname, port, database name and connecting role name. In particular, authentication credentials are not stored in the catalogs.

pgcopydb

pgcopydb - copy an entire Postgres database from source to target

Synopsis

pgcopydb provides the following commands

pgcopydb: pgcopydb tool
usage: pgcopydb [ --verbose --quiet ]


Available commands:
  pgcopydb
    clone     Clone an entire database from source to target
    fork      Clone an entire database from source to target
    follow    Replay changes from the source database to the target database
    snapshot  Create and export a snapshot on the source database
  + compare   Compare source and target databases
  + copy      Implement the data section of the database copy
  + dump      Dump database objects from a Postgres instance
  + restore   Restore database objects into a Postgres instance
  + list      List database objects from a Postgres instance
  + stream    Stream changes from the source database
    ping      Attempt to connect to the source and target instances
    help      Print help message
    version   Print pgcopydb version

Description

The pgcopydb command implements a full migration of an entire Postgres database from a source instance to a target instance. Both the Postgres instances must be available for the entire duration of the command.

The pgcopydb command also implements a full Logical Decoding client for Postgres, allowing Change Data Capture to replay data changes (DML) happening on the source database after the base copy snapshot. The pgcopydb logical decoding client code is compatible with both test_decoding and wal2json output plugins, and defaults to using test_decoding.

pgcopydb help

The pgcopydb help command lists all the supported sub-commands:

pgcopydb
  clone     Clone an entire database from source to target
  fork      Clone an entire database from source to target
  follow    Replay changes from the source database to the target database
  snapshot  Create and export a snapshot on the source database
+ compare   Compare source and target databases
+ copy      Implement the data section of the database copy
+ dump      Dump database objects from a Postgres instance
+ restore   Restore database objects into a Postgres instance
+ list      List database objects from a Postgres instance
+ stream    Stream changes from the source database
  ping      Attempt to connect to the source and target instances
  help      Print help message
  version   Print pgcopydb version

pgcopydb compare
  schema  Compare source and target schema
  data    Compare source and target data

pgcopydb copy
  db           Copy an entire database from source to target
  roles        Copy the roles from the source instance to the target instance
  extensions   Copy the extensions from the source instance to the target instance
  schema       Copy the database schema from source to target
  data         Copy the data section from source to target
  table-data   Copy the data from all tables in database from source to target
  blobs        Copy the blob data from the source database to the target
  sequences    Copy the current value from all sequences in database from source to target
  indexes      Create all the indexes found in the source database in the target
  constraints  Create all the constraints found in the source database in the target

pgcopydb dump
  schema  Dump source database schema as custom files in work directory
  roles   Dump source database roles as custome file in work directory

pgcopydb restore
  schema      Restore a database schema from custom files to target database
  pre-data    Restore a database pre-data schema from custom file to target database
  post-data   Restore a database post-data schema from custom file to target database
  roles       Restore database roles from SQL file to target database
  parse-list  Parse pg_restore --list output from custom file

pgcopydb list
  databases    List databases
  extensions   List all the source extensions to copy
  collations   List all the source collations to copy
  tables       List all the source tables to copy data from
  table-parts  List a source table copy partitions
  sequences    List all the source sequences to copy data from
  indexes      List all the indexes to create again after copying the data
  depends      List all the dependencies to filter-out
  schema       List the schema to migrate, formatted in JSON
  progress     List the progress

pgcopydb stream
  setup      Setup source and target systems for logical decoding
  cleanup    Cleanup source and target systems for logical decoding
  prefetch   Stream JSON changes from the source database and transform them to SQL
  catchup    Apply prefetched changes from SQL files to the target database
  replay     Replay changes from the source to the target database, live
+ sentinel   Maintain a sentinel table
  receive    Stream changes from the source database
  transform  Transform changes from the source database into SQL commands
  apply      Apply changes from the source database into the target database

pgcopydb stream sentinel
  setup  Setup the sentinel table
  get    Get the sentinel table values
+ set    Set the sentinel table values

pgcopydb stream sentinel set
  startpos  Set the sentinel start position LSN
  endpos    Set the sentinel end position LSN
  apply     Set the sentinel apply mode
  prefetch  Set the sentinel prefetch mode

pgcopydb version

The pgcopydb version command outputs the version string of the version of pgcopydb used, and can do that in the JSON format when using the --json option.

$ pgcopydb version
pgcopydb version 0.13.1.g868ad77
compiled with PostgreSQL 13.11 (Debian 13.11-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
compatible with Postgres 10, 11, 12, 13, 14, and 15

In JSON:

$ pgcopydb version --json
{
    "pgcopydb": "0.13.1.g868ad77",
    "pg_major": "13",
    "pg_version": "13.11 (Debian 13.11-0+deb11u1)",
    "pg_version_str": "PostgreSQL 13.11 (Debian 13.11-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit",
    "pg_version_num": 130011
}

The details about the Postgres version applies to the version that's been used to build pgcopydb from sources, so that's the version of the client library libpq really.

pgcopydb ping

The pgcopydb ping command attempts to connect to both the source and the target Postgres databases, concurrently.

pgcopydb ping: Attempt to connect to the source and target instances
usage: pgcopydb ping  --source ... --target ... 

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database

An example output looks like the following:

$ pgcopydb ping
18:04:48 84679 INFO   Running pgcopydb version 0.10.31.g7e5fbb8.dirty from "/Users/dim/dev/PostgreSQL/pgcopydb/src/bin/pgcopydb/pgcopydb"
18:04:48 84683 INFO   Successfully could connect to target database at "postgres://@:/plop?"
18:04:48 84682 INFO   Successfully could connect to source database at "postgres://@:/pagila?"

This command implements a retry policy (named Decorrelated Jitter) and can be used in automation to make sure that the databases are ready to accept connections.

pgcopydb clone

The main pgcopydb operation is the clone operation, and for historical and user friendliness reasons two aliases are available that implement the same operation:

pgcopydb
  clone     Clone an entire database from source to target
  fork      Clone an entire database from source to target

pgcopydb clone

The command pgcopydb clone copies a database from the given source Postgres instance to the target Postgres instance.

pgcopydb clone: Clone an entire database from source to target
usage: pgcopydb clone  --source ... --target ... [ --table-jobs ... --index-jobs ... ] 

  --source                      Postgres URI to the source database
  --target                      Postgres URI to the target database
  --dir                         Work directory to use
  --table-jobs                  Number of concurrent COPY jobs to run
  --index-jobs                  Number of concurrent CREATE INDEX jobs to run
  --restore-jobs                Number of concurrent jobs for pg_restore
  --large-objects-jobs          Number of concurrent Large Objects jobs to run
  --split-tables-larger-than    Same-table concurrency size threshold
  --split-max-parts             Maximum number of jobs for Same-table concurrency 
  --estimate-table-sizes        Allow using estimates for relation sizes
  --drop-if-exists              On the target database, clean-up from a previous run first
  --roles                       Also copy roles found on source to target
  --no-role-passwords           Do not dump passwords for roles
  --no-owner                    Do not set ownership of objects to match the original database
  --no-acl                      Prevent restoration of access privileges (grant/revoke commands).
  --no-comments                 Do not output commands to restore comments
  --no-tablespaces              Do not output commands to select tablespaces
  --skip-large-objects          Skip copying large objects (blobs)
  --skip-extensions             Skip restoring extensions
  --skip-ext-comments           Skip restoring COMMENT ON EXTENSION
  --skip-collations             Skip restoring collations
  --skip-vacuum                 Skip running VACUUM ANALYZE
  --skip-analyze                Skip running vacuumdb --analyze-only
  --skip-db-properties          Skip copying ALTER DATABASE SET properties
  --skip-split-by-ctid          Skip spliting tables by ctid
  --requirements <filename>     List extensions requirements
  --filters <filename>          Use the filters defined in <filename>
  --fail-fast                   Abort early in case of error
  --restart                     Allow restarting when temp files exist already
  --resume                      Allow resuming operations after a failure
  --not-consistent              Allow taking a new snapshot on the source database
  --snapshot                    Use snapshot obtained with pg_export_snapshot
  --follow                      Implement logical decoding to replay changes
  --plugin                      Output plugin to use (test_decoding, wal2json)
  --wal2json-numeric-as-string  Print numeric data type as string when using wal2json output plugin
  --slot-name                   Use this Postgres replication slot name
  --create-slot                 Create the replication slot
  --origin                      Use this Postgres replication origin node name
  --endpos                      Stop replaying changes when reaching this LSN
  --use-copy-binary             Use the COPY BINARY format for COPY operations

pgcopydb fork

The command pgcopydb fork copies a database from the given source Postgres instance to the target Postgres instance. This command is an alias to the command pgcopydb clone seen above.

Description

The pgcopydb clone command implements both a base copy of a source database into a target database and also a full Logical Decoding client for the wal2json logical decoding plugin.

Base copy, or the clone operation

The pgcopydb clone command implements the following steps:

  1. pgcopydb gets the list of ordinary and partitioned tables from a catalog query on the source database, and also the list of indexes, and the list of sequences with their current values.

    When filtering is used, the list of objects OIDs that are meant to be filtered out is built during this step.

  2. pgcopydb calls into pg_dump to produce the pre-data section and the post-data sections of the dump using Postgres custom format.

  3. The pre-data section of the dump is restored on the target database using the pg_restore command, creating all the Postgres objects from the source database into the target database.

    When filtering is used, the pg_restore --use-list feature is used to filter the list of objects to restore in this step.

    This step uses as many as --restore-jobs jobs for pg_restore to share the workload and restore the objects in parallel.

  4. Then as many as --table-jobs COPY sub-processes are started to share the workload and COPY the data from the source to the target database one table at a time, in a loop.

    A Postgres connection and a SQL query to the Postgres catalog table pg_class is used to get the list of tables with data to copy around, and the reltuples statistic is used to start with the tables with the greatest number of rows first, as an attempt to minimize the copy time.

  5. An auxiliary process loops through all the Large Objects found on the source database and copies its data parts over to the target database, much like pg_dump itself would.

    This step is much like pg_dump | pg_restore for large objects data parts, except that there isn't a good way to do just that with the tooling.

  6. As many as --index-jobs CREATE INDEX sub-processes are started to share the workload and build indexes. In order to make sure to start the CREATE INDEX commands only after the COPY operation has completed, a queue mechanism is used. As soon as a table data COPY has completed, all the indexes for the table are queued for processing by the CREATE INDEX sub-processes.

    The primary indexes are created as UNIQUE indexes at this stage.

  7. Then the PRIMARY KEY constraints are created USING the just built indexes. This two-steps approach allows the primary key index itself to be created in parallel with other indexes on the same table, avoiding an EXCLUSIVE LOCK while creating the index.

  8. As many as --table-jobs VACUUM ANALYZE sub-processes are started to share the workload. As soon as a table data COPY has completed, the table is queued for processing by the VACUUM ANALYZE sub-processes.

  9. An auxilliary process loops over the sequences on the source database and for each of them runs a separate query on the source to fetch the last_value and the is_called metadata the same way that pg_dump does.

    For each sequence, pgcopydb then calls pg_catalog.setval() on the target database with the information obtained on the source database.

  10. The final stage consists now of running the pg_restore command for the post-data section script for the whole database, and that's where the foreign key constraints and other elements are created.

    The post-data script is filtered out using the pg_restore --use-list option so that indexes and primary key constraints already created in steps 6 and 7 are properly skipped now.

    This step uses as many as --restore-jobs jobs for pg_restore to share the workload and restore the objects in parallel.

Postgres privileges, superuser, and dump and restore

Postgres has a notion of a superuser status that can be assigned to any role in the system, and the default role postgres has this status. From the Role Attributes documentation page we see that:

superuser status:

A database superuser bypasses all permission checks, except the right to log in. This is a dangerous privilege and should not be used carelessly; it is best to do most of your work as a role that is not a superuser. To create a new database superuser, use CREATE ROLE name SUPERUSER. You must do this as a role that is already a superuser.

Some Postgres objects can only be created by superusers, and some read and write operations are only allowed to superuser roles, such as the following non-exclusive list:

  • Reading the pg_authid role password (even when encrypted) is restricted to roles with the superuser status. Reading this catalog table is done when calling pg_dumpall --roles-only so that the dump file can then be used to restore roles including their passwords.

    It is possible to implement a pgcopydb migration that skips the passwords entirely when using the option --no-role-passwords. In that case though authentication might fail until passwords have been setup again correctly.

  • Most of the available Postgres extensions, at least when being written in C, are then only allowed to be created by roles with superuser status.

    When such an extension contains Extension Configuration Tables and has been created with a role having superuser status, then the same superuser status is needed again to pg_dump and pg_restore that extension and its current configuration.

When using pgcopydb it is possible to split your migration in privileged and non-privileged parts, like in the following examples:

$ coproc ( pgcopydb snapshot )

# first two commands would use a superuser role to connect
$ pgcopydb copy roles --source ... --target ...
$ pgcopydb copy extensions --source ... --target ...

# now it's possible to use a non-superuser role to connect
$ pgcopydb clone --skip-extensions --source ... --target ...

$ kill -TERM ${COPROC_PID}
$ wait ${COPROC_PID}

In such a script, the calls to pgcopydb copy roles and pgcopydb copy extensions would be done with connection strings that connects with a role having superuser status; and then the call to pgcopydb clone would be done with a non-privileged role, typically the role that owns the source and target databases.

Warning

That said, there is currently a limitation in pg_dump that impacts pgcopydb. When an extension with configuration table has been installed as superuser, even the main pgcopydb clone operation has to be done with superuser status.

That's because pg_dump filtering (here, there --exclude-table option) does not apply to extension members, and pg_dump does not provide a mechanism to exclude extensions.

Change Data Capture using Postgres Logical Decoding

When using the --follow option the steps from the pgcopydb follow command are also run concurrently to the main copy. The Change Data Capture is then automatically driven from a prefetch-only phase to the prefetch-and-catchup phase, which is enabled as soon as the base copy is done.

See the command pgcopydb stream sentinel set endpos to remote control the follow parts of the command even while the command is already running.

The command pgcopydb stream cleanup must be used to free resources created to support the change data capture process.

Caution

Make sure to read the documentation for pgcopydb follow and the specifics about Logical Replication Restrictions as documented by Postgres.

Change Data Capture Example 1

A simple approach to applying changes after the initial base copy has been done follows:

$ pgcopydb clone --follow &

# later when the application is ready to make the switch
$ pgcopydb stream sentinel set endpos --current

# later when the migration is finished, clean-up both source and target
$ pgcopydb stream cleanup

Change Data Capture Example 2

In some cases, it might be necessary to have more control over some of the steps taken here. Given pgcopydb flexibility, it's possible to implement the following steps:

  1. Grab a snapshot from the source database and hold an open Postgres connection for the duration of the base copy.

    In case of crash or other problems with the main operations, it's then possible to resume processing of the base copy and the applying of the changes with the same snapshot again.

    This step is also implemented when using pgcopydb clone --follow. That said, if the command was interrupted (or crashed), then the snapshot would be lost.

  2. Setup the logical decoding within the snapshot obtained in the previous step, and the replication tracking on the target database.

    The following SQL objects are then created:

    • a replication slot on the source database,

    • a replication origin on the target database.

    This step is also implemented when using pgcopydb clone --follow. There is no way to implement Change Data Capture with pgcopydb and skip creating those SQL objects.

  3. Start the base copy of the source database, and prefetch logical decoding changes to ensure that we consume from the replication slot and allow the source database server to recycle its WAL files.

  4. Remote control the apply process to stop consuming changes and applying them on the target database.

  5. Re-sync the sequences to their now-current values.

    Sequences are not handled by Postgres logical decoding, so extra care needs to be implemented manually here.

    Caution

    The next version of pgcopydb will include that step in the pgcopydb clone --snapshot command automatically, after it stops consuming changes and before the process terminates.

  6. Clean-up the specific resources created for supporting resumability of the whole process (replication slot on the source database, replication origin on the target database).

  7. Stop holding a snaphot on the source database by stopping the pgcopydb snapshot process left running in the background.

If the command pgcopydb clone --follow fails it's then possible to start it again. It will automatically discover what was done successfully and what needs to be done again because it failed or was interrupted (table copy, index creation, resuming replication slot consuming, resuming applying changes at the right LSN position, etc).

Here is an example implement the previous steps:

$ pgcopydb snapshot &

$ pgcopydb stream setup

$ pgcopydb clone --follow &

# later when the application is ready to make the switch
$ pgcopydb stream sentinel set endpos --current

# when the follow process has terminated, re-sync the sequences
$ pgcopydb copy sequences

# later when the migration is finished, clean-up both source and target
$ pgcopydb stream cleanup

# now stop holding the snapshot transaction (adjust PID to your environment)
$ kill %1

Options

The following options are available to pgcopydb clone:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

--table-jobs

How many tables can be processed in parallel.

This limit only applies to the COPY operations, more sub-processes will be running at the same time that this limit while the CREATE INDEX operations are in progress, though then the processes are only waiting for the target Postgres instance to do all the work.

--index-jobs

How many indexes can be built in parallel, globally. A good option is to set this option to the count of CPU cores that are available on the Postgres target system, minus some cores that are going to be used for handling the COPY operations.

--restore-jobs

How many threads or processes can be used during pg_restore. A good option is to set this option to the count of CPU cores that are available on the Postgres target system.

If this value is not set, we reuse the --index-jobs value. If that value is not set either, we use the the default value for --index-jobs.

--large-object-jobs

How many worker processes to start to copy Large Objects concurrently.

--split-tables-larger-than

Allow Same-table Concurrency when processing the source database. This environment variable value is expected to be a byte size, and bytes units B, kB, MB, GB, TB, PB, and EB are known.

--estimate-table-sizes

Use estimates on table sizes to decide how to split tables when using Same-table Concurrency.

When this option is used, we run vacuumdb --analyze-only --jobs=<table-jobs> command on the source database that updates the statistics for the number of pages for each relation. Later, we use the number of pages, and the size for each page to estimate the actual size of the tables.

If you wish to run the ANALYZE command manually before running pgcopydb, you can use the --skip-analyze option. This way, you can decrease the time spent on the migration.

--drop-if-exists

When restoring the schema on the target Postgres instance, pgcopydb actually uses pg_restore. When this options is specified, then the following pg_restore options are also used: --clean --if-exists.

This option is useful when the same command is run several times in a row, either to fix a previous mistake or for instance when used in a continuous integration system.

This option causes DROP TABLE and DROP INDEX and other DROP commands to be used. Make sure you understand what you’re doing here!

--roles

The option --roles add a preliminary step that copies the roles found on the source instance to the target instance. As Postgres roles are global object, they do not exist only within the context of a specific database, so all the roles are copied over when using this option.

The pg_dumpall --roles-only is used to fetch the list of roles from the source database, and this command includes support for passwords. As a result, this operation requires the superuser privileges.

See also pgcopydb copy roles.

--no-role-passwords

Do not dump passwords for roles. When restored, roles will have a null password, and password authentication will always fail until the password is set. Since password values aren’t needed when this option is specified, the role information is read from the catalog view pg_roles instead of pg_authid. Therefore, this option also helps if access to pg_authid is restricted by some security policy.

--no-owner

Do not output commands to set ownership of objects to match the original database. By default, pg_restore issues ALTER OWNER or SET SESSION AUTHORIZATION statements to set ownership of created schema elements. These statements will fail unless the initial connection to the database is made by a superuser (or the same user that owns all of the objects in the script). With --no-owner, any user name can be used for the initial connection, and this user will own all the created objects.

--skip-large-objects

Skip copying large objects, also known as blobs, when copying the data from the source database to the target database.

--skip-extensions

Skip copying extensions from the source database to the target database.

When used, schema that extensions depend-on are also skipped: it is expected that creating needed extensions on the target system is then the responsibility of another command (such as pgcopydb copy extensions), and schemas that extensions depend-on are part of that responsibility.

Because creating extensions require superuser, this allows a multi-steps approach where extensions are dealt with superuser privileges, and then the rest of the pgcopydb operations are done without superuser privileges.

--skip-ext-comments

Skip copying COMMENT ON EXTENSION commands. This is implicit when using –skip-extensions.

--requirements <filename>

This option allows to specify which version of an extension to install on the target database. The given filename is expected to be a JSON file, and the JSON contents must be an array of objects with the keys "name" and "version".

The command pgcopydb list extensions --requirements --json produces such a JSON file and can be used on the target database instance to get started.

See also the command pgcopydb list extensions --available-versions.

See also pgcopydb list extensions.

--skip-collations

Skip copying collations from the source database to the target database.

In some scenarios the list of collations provided by the Operating System on the source and target system might be different, and a mapping then needs to be manually installed before calling pgcopydb.

Then this option allows pgcopydb to skip over collations and assume all the needed collations have been deployed on the target database already.

See also pgcopydb list collations.

--skip-vacuum

Skip running VACUUM ANALYZE on the target database once a table has been copied, its indexes have been created, and constraints installed.

--skip-analyze

Skip running vacuumdb --analyze-only on the source database to update statistics that are required when estimating table sizes.

This option is useful only when using --estimate-table-sizes and the user runs the relevant ANALYZE command manually before running pgcopydb.

--skip-db-properties

Skip fetching database properties and copying them using the SQL command ALTER DATABASE ... SET name = value. This is useful when the source and target database have a different set of properties, or when the target database is hosted in a way that disabled setting some of the properties that have been set on the source database, or also when copying these settings is not wanted.

--skip-split-by-ctid

Skip splitting tables based on CTID during the copy operation. By default, pgcopydb splits large tables into smaller chunks based on the CTID column if there isn’t a unique integer column in the table. However, in some cases you may want to skip this splitting process if the CTID range scan is slow in the underlying system.

--filters <filename>

This option allows to exclude table and indexes from the copy operations. See Filtering for details about the expected file format and the filtering options available.

--fail-fast

Abort early in case of error by sending the TERM signal to all the processes in the pgcopydb process group.

--restart

When running the pgcopydb command again, if the work directory already contains information from a previous run, then the command refuses to proceed and delete information that might be used for diagnostics and forensics.

In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.

--resume

When the pgcopydb command was terminated before completion, either by an interrupt signal (such as C-c or SIGTERM) or because it crashed, it is possible to resume the database migration.

When resuming activity from a previous run, table data that was fully copied over to the target server is not sent again. Table data that was interrupted during the COPY has to be started from scratch even when using --resume: the COPY command in Postgres is transactional and was rolled back.

Same reasonning applies to the CREATE INDEX commands and ALTER TABLE commands that pgcopydb issues, those commands are skipped on a --resume run only if known to have run through to completion on the previous one.

Finally, using --resume requires the use of --not-consistent.

--not-consistent

In order to be consistent, pgcopydb exports a Postgres snapshot by calling the pg_export_snapshot() function on the source database server. The snapshot is then re-used in all the connections to the source database server by using the SET TRANSACTION SNAPSHOT command.

Per the Postgres documentation about pg_export_snapshot: Saves the transaction’s current snapshot and returns a text string identifying the snapshot. This string must be passed (outside the database) to clients that want to import the snapshot. The snapshot is available for import only until the end of the transaction that exported it.

Now, when the pgcopydb process was interrupted (or crashed) on a previous run, it is possible to resume operations, but the snapshot that was exported does not exists anymore. The pgcopydb command can only resume operations with a new snapshot, and thus can not ensure consistency of the whole data set, because each run is now using their own snapshot.

--snapshot

Instead of exporting its own snapshot by calling the Tantor SE-1C function pg_export_snapshot() it is possible for pgcopydb to re-use an already exported snapshot.

--follow

When the --follow option is used then pgcopydb implements Change Data Capture as detailed in the manual page for pgcopydb follow in parallel to the main copy database steps.

The replication slot is created using the same snapshot as the main database copy operation, and the changes to the source database are prefetched only during the initial copy, then prefetched and applied in a catchup process.

It is possible to give pgcopydb clone --follow a termination point (the LSN endpos) while the command is running with the command pgcopydb stream sentinel set endpos.

--plugin

Logical decoding output plugin to use. The default is test_decoding which ships with Postgres core itself, so is probably already available on your source server.

It is possible to use wal2json instead. The support for wal2json is mostly historical in pgcopydb, it should not make a user visible difference whether you use the default test_decoding or wal2json.

--wal2json-numeric-as-string

When using the wal2json output plugin, it is possible to use the --wal2json-numeric-as-string option to instruct wal2json to output numeric values as strings and thus prevent some precision loss.

You need to have a wal2json plugin version on source database that supports --numeric-data-types-as-string option to use this option.

See also the documentation for wal2json regarding this option for details.

--slot-name

Logical decoding slot name to use. Defaults to pgcopydb. which is unfortunate when your use-case involves migrating more than one database from the source server.

--create-slot

Instruct pgcopydb to create the logical replication slot to use.

--endpos

Logical replication target LSN to use. Automatically stop replication and exit with normal exit status 0 when receiving reaches the specified LSN. If there’s a record with LSN exactly equal to lsn, the record will be output.

The --endpos option is not aware of transaction boundaries and may truncate output partway through a transaction. Any partially output transaction will not be consumed and will be replayed again when the slot is next read from. Individual messages are never truncated.

See also documentation for pg_recvlogical.

--use-copy-binary

Use the COPY WITH (FORMAT BINARY) instead of the COPY command.

See also documentation for COPY.

--origin

Logical replication target system needs to track the transactions that have been applied already, so that in case we get disconnected or need to resume operations we can skip already replayed transaction.

Postgres uses a notion of an origin node name as documented in Replication Progress Tracking. This option allows to pick your own node name and defaults to “pgcopydb”. Picking a different name is useful in some advanced scenarios like migrating several sources in the same target, where each source should have their own unique origin node name.

--verbose, --notice

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, SQL, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TARGET_PGURI

Connection string to the target Postgres instance. When --target is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TABLE_JOBS

Number of concurrent jobs allowed to run COPY operations in parallel. When --table-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_INDEX_JOBS

Number of concurrent jobs allowed to run CREATE INDEX operations in parallel. When --index-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_RESTORE_JOBS

Number of concurrent jobs allowed to run pg_restore operations in parallel. When --restore-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_LARGE_OBJECTS_JOBS

Number of concurrent jobs allowed to copy Large Objects data in parallel. When --large-objects-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_SPLIT_TABLES_LARGER_THAN

Allow Same-table Concurrency when processing the source database. This environment variable value is expected to be a byte size, and bytes units B, kB, MB, GB, TB, PB, and EB are known.

When --split-tables-larger-than is ommitted from the command line, then this environment variable is used.

PGCOPYDB_SPLIT_MAX_PARTS

Limit the maximum number of parts when Same-table Concurrency is used. When --split-max-parts is ommitted from the command line, then this environment variable is used.

PGCOPYDB_ESTIMATE_TABLE_SIZES

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb estimates the size of tables to determine whether or not to split tables. This option is only useful when querying the relation sizes on source database is costly.

When --estimate-table-sizes is ommitted from the command line, then this environment variable is used.

When this option is used, we run vacuumdb --analyze-only --jobs=<table-jobs> command on the source database that updates the statistics for the number of pages for each relation. Later, we use the number of pages, and the size for each page to estimate the actual size of the tables.

If you wish to run the ANALYZE command manually before running pgcopydb, you can use the --skip-analyze option or PGCOPYDB_SKIP_ANALYZE environment variable. This way, you can decrease the time spent on the migration.

PGCOPYDB_OUTPUT_PLUGIN

Logical decoding output plugin to use. When --plugin is omitted from the command line, then this environment variable is used.

PGCOPYDB_WAL2JSON_NUMERIC_AS_STRING

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the wal2json option --numeric-data-types-as-string when using the wal2json output plugin.

When --wal2json-numeric-as-string is ommitted from the command line then this environment variable is used.

PGCOPYDB_DROP_IF_EXISTS

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the pg_restore options --clean --if-exists when creating the schema on the target Postgres instance.

When --drop-if-exists is ommitted from the command line then this environment variable is used.

PGCOPYDB_FAIL_FAST

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb sends the TERM signal to all the processes in its process group as soon as one process terminates with a non-zero return code.

When --fail-fast is ommitted from the command line then this environment variable is used.

PGCOPYDB_SKIP_VACUUM

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb skips the VACUUM ANALYZE jobs entirely, same as when using the --skip-vacuum option.

PGCOPYDB_SKIP_ANALYZE

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb skips the vacuumdb --analyze-only commands entirely, same as when using the --skip-analyze option.

PGCOPYDB_SKIP_DB_PROPERTIES

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb skips the ALTER DATABASET SET properties commands that copy the setting from the source to the target database, same as when using the --skip-db-properties option.

PGCOPYDB_SKIP_CTID_SPLIT

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb skips the CTID split operation during the clone process, same as when using the --skip-split-by-ctid option.

PGCOPYDB_USE_COPY_BINARY

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the COPY WITH (FORMAT BINARY) instead of the COPY command, same as when using the --use-copy-binary option.

PGCOPYDB_SNAPSHOT

Postgres snapshot identifier to re-use, see also --snapshot.

TMPDIR

The pgcopydb command creates all its work files and directories in ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.

PGCOPYDB_LOG_TIME_FORMAT

The logs time format defaults to %H:%M:%S when pgcopydb is used on an interactive terminal, and to %Y-%m-%d %H:%M:%S otherwise. This environment variable can be set to any format string other than the defaults.

See documentation for strftime(3) for details about the format string. See documentation for isatty(3) for details about detecting if pgcopydb is run in an interactive terminal.

PGCOPYDB_LOG_JSON

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb formats its logs using JSON.

{
  "timestamp": "2023-04-13 16:53:14",
  "pid": 87956,
  "error_level": 4,
  "error_severity": "INFO",
  "file_name": "main.c",
  "file_line_num": 165,
  "message": "Running pgcopydb version 0.11.19.g2290494.dirty from \"/Users/dim/dev/PostgreSQL/pgcopydb/src/bin/pgcopydb/pgcopydb\""
}
PGCOPYDB_LOG_FILENAME

When set to a filename (in a directory that must exists already) then pgcopydb writes its logs output to that filename in addition to the logs on the standard error output stream.

If the file already exists, its content is overwritten. In other words the previous content would be lost when running the same command twice.

PGCOPYDB_LOG_JSON_FILE

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb formats its logs using JSON when writing to PGCOPYDB_LOG_FILENAME.

XDG_DATA_HOME

The standard XDG Base Directory Specification defines several environment variables that allow controling where programs should store their files.

XDG_DATA_HOME defines the base directory relative to which user-specific data files should be stored. If $XDG_DATA_HOME is either not set or empty, a default equal to $HOME/.local/share should be used.

When using Change Data Capture (through --follow option and Postgres logical decoding with wal2json) then pgcopydb pre-fetches changes in JSON files and transform them into SQL files to apply to the target database.

These files are stored at the following location, tried in this order:

  1. when --dir is used, then pgcopydb uses the cdc subdirectory of the --dir location,

  2. when XDG_DATA_HOME is set in the environment, then pgcopydb uses that location,

  3. when neither of the previous settings have been used then pgcopydb defaults to using ${HOME}/.local/share.

Examples

$ export PGCOPYDB_SOURCE_PGURI=postgres://pagila:0wn3d@source/pagila
$ export PGCOPYDB_TARGET_PGURI=postgres://pagila:0wn3d@target/pagila
$ export PGCOPYDB_DROP_IF_EXISTS=on

$ pgcopydb clone --table-jobs 8 --index-jobs 12
08:13:13.961 42893 INFO   [SOURCE] Copying database from "postgres://pagila:0wn3d@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
08:13:13.961 42893 INFO   [TARGET] Copying database into "postgres://pagila:0wn3d@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
08:13:14.009 42893 INFO   Using work dir "/tmp/pgcopydb"
08:13:14.017 42893 INFO   Exported snapshot "00000003-000000EB-1" from the source database
08:13:14.019 42904 INFO   STEP 1: fetch source database tables, indexes, and sequences
08:13:14.339 42904 INFO   Fetched information for 5 tables (including 0 tables split in 0 partitions total), with an estimated total of 1000 thousands tuples and 128 MB on-disk
08:13:14.342 42904 INFO   Fetched information for 4 indexes (supporting 4 constraints)
08:13:14.343 42904 INFO   Fetching information for 1 sequences
08:13:14.353 42904 INFO   Fetched information for 1 extensions
08:13:14.436 42904 INFO   Found 1 indexes (supporting 1 constraints) in the target database
08:13:14.443 42904 INFO   STEP 2: dump the source database schema (pre/post data)
08:13:14.448 42904 INFO    /usr/bin/pg_dump -Fc --snapshot 00000003-000000EB-1 --section=pre-data --section=post-data --file /tmp/pgcopydb/schema/schema.dump 'postgres://pagila:0wn3d@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
08:13:14.513 42904 INFO   STEP 3: restore the pre-data section to the target database
08:13:14.524 42904 INFO    /usr/bin/pg_restore --dbname 'postgres://pagila:0wn3d@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --section pre-data --jobs 2 --use-list /tmp/pgcopydb/schema/pre-filtered.list /tmp/pgcopydb/schema/schema.dump
08:13:14.608 42919 INFO   STEP 4: starting 8 table-data COPY processes
08:13:14.678 42921 INFO   STEP 8: starting 8 VACUUM processes
08:13:14.678 42904 INFO   Skipping large objects: none found.
08:13:14.693 42920 INFO   STEP 6: starting 2 CREATE INDEX processes
08:13:14.693 42920 INFO   STEP 7: constraints are built by the CREATE INDEX processes
08:13:14.699 42904 INFO   STEP 9: reset sequences values
08:13:14.700 42959 INFO   Set sequences values on the target database
08:13:16.716 42904 INFO   STEP 10: restore the post-data section to the target database
08:13:16.726 42904 INFO    /usr/bin/pg_restore --dbname 'postgres://pagila:0wn3d@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --section post-data --jobs 2 --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/schema.dump
08:13:16.751 42904 INFO   All step are now done,  2s728 elapsed
08:13:16.752 42904 INFO   Printing summary for 5 tables and 4 indexes

  OID | Schema |             Name | Parts | copy duration | transmitted bytes | indexes | create index duration
------+--------+------------------+-------+---------------+-------------------+---------+----------------------
16398 | public | pgbench_accounts |     1 |         1s496 |             91 MB |       1 |                 302ms
16395 | public |  pgbench_tellers |     1 |          37ms |            1002 B |       1 |                  15ms
16401 | public | pgbench_branches |     1 |          45ms |              71 B |       1 |                  18ms
16386 | public |           table1 |     1 |          36ms |             984 B |       1 |                  21ms
16392 | public |  pgbench_history |     1 |          41ms |               0 B |       0 |                   0ms


                                               Step   Connection    Duration    Transfer   Concurrency
 --------------------------------------------------   ----------  ----------  ----------  ------------
   Catalog Queries (table ordering, filtering, etc)       source       119ms                         1
                                        Dump Schema       source        66ms                         1
                                     Prepare Schema       target        59ms                         1
      COPY, INDEX, CONSTRAINTS, VACUUM (wall clock)         both       2s125                        18
                                  COPY (cumulative)         both       1s655      128 MB             8
                          CREATE INDEX (cumulative)       target       343ms                         2
                           CONSTRAINTS (cumulative)       target        13ms                         2
                                VACUUM (cumulative)       target       144ms                         8
                                    Reset Sequences         both        15ms                         1
                         Large Objects (cumulative)       (null)         0ms                         0
                                    Finalize Schema         both        27ms                         2
 --------------------------------------------------   ----------  ----------  ----------  ------------
                          Total Wall Clock Duration         both       2s728                        24

pgcopydb follow

The command pgcopydb follow replays the database changes registered at the source database with the logical decoding plugin of your choice, either the default test_decoding or wal2json, into the target database.

Important

While the pgcopydb follow is a full client for logical decoding, the general use case involves using pgcopydb clone --follow as documented in Change Data Capture using Postgres Logical Decoding.

When using Logical Decoding with pgcopydb or another tool, consider making sure you’re familiar with the Logical Replication Restrictions that apply. In particular:

  • DDL are not replicated.

    When using DDL for partition scheme maintenance, such as when using the pg_partman extension, then consider creating a week or a month of partitions in advance, so that creating new partitions does not happen during the migration window.

  • Sequence data is not replicated.

    When using pgcopydb clone --follow (starting with pgcopydb version 0.9) then the sequence data is synced at the end of the operation, after the cutover point implemented via the pgcopydb stream sentinel set endpos.

    Updating the sequences manually is also possible by running the command pgcopydb copy sequences.

  • Large Objects are not replicated.

See the Postgres documentation page for Logical Replication Restrictions to read the exhaustive list of restrictions.

pgcopydb follow

pgcopydb follow: Replay changes from the source database to the target database
usage: pgcopydb follow  --source ... --target ...

  --source                      Postgres URI to the source database
  --target                      Postgres URI to the target database
  --dir                         Work directory to use
  --filters <filename>          Use the filters defined in <filename>
  --restart                     Allow restarting when temp files exist already
  --resume                      Allow resuming operations after a failure
  --not-consistent              Allow taking a new snapshot on the source database
  --snapshot                    Use snapshot obtained with pg_export_snapshot
  --plugin                      Output plugin to use (test_decoding, wal2json)
  --wal2json-numeric-as-string  Print numeric data type as string when using wal2json output plugin
  --slot-name                   Use this Postgres replication slot name
  --create-slot                 Create the replication slot
  --origin                      Use this Postgres replication origin node name
  --endpos                      Stop replaying changes when reaching this LSN

Description

This command runs three concurrent subprocesses in two possible modes of operation:

  • The first mode of operation is named prefetch and catchup where the changes from the source database are stored in intermediate JSON and SQL files to be later replayed one file at a time in the catchup process.

  • The second mode of operation is named live replay where the changes from the source database are streamed from the receiver process to the transform process using a Unix pipe, and then with the same mechanism from the transform process to the replay process.

Only one mode of operation may be active at any given time, and pgcopydb automatically switches from one mode to the other one, in a loop.

The follow command always starts using the prefetch and catchup mode, and as soon as the catchup process can’t find the next SQL file to replay then it exits, triggering the switch to the live replay mode. Before entering the new mode, to make sure to replay all the changes that have been received, pgcopydb implements an extra catchup phase without concurrent activity.

Prefetch and Catchup

In the prefetch and catchup mode of operations, the three processes are implementing the following approach:

  1. The first process pre-fetches the changes from the source database using the Postgres Logical Decoding protocol and save the JSON messages in local JSON files.

  2. The second process transforms the JSON files into SQL. A Unix system V message queue is used to communicate LSN positions from the prefetch process to the transform process.

  3. The third process catches-up with changes happening on the source database by applying the SQL files to the target database system.

    The Postgres API for Replication Progress Tracking is used in that process so that we can skip already applied transactions at restart or resume.

Live Replay

In the live replay mode of operations, the three processes are implementing the following approach:

  1. The first process receives the changes from the source database using the Postgres Logical Decoding protocol and save the JSON messages in local JSON files.

    Additionnaly, the JSON changes are written to a Unix pipe shared with the transform process.

  2. The second process transforms the JSON lines into SQL. A Unix pipe is used to stream the JSON lines from the receive process to the transform process.

    The transform process in that mode still writes the changes to SQL files, so that it’s still possible to catchup with received changes if the apply process is interrupted.

  3. The third process replays the changes happening on the source database by applying the SQL commands to the target database system. The SQL commands are read from the Unix pipe shared with the transform process.

    The Postgres API for Replication Progress Tracking is used in that process so that we can skip already applied transactions at restart or resume.

Remote control of the follow command

It is possible to start the pgcopydb follow command and then later, while it’s still running, set the LSN for the end position with the same effect as using the command line option --endpos, or switch from prefetch mode only to prefetch and catchup mode. For that, see the commands pgcopydb stream sentinel set endpos, pgcopydb stream sentinel set apply, and pgcopydb stream sentinel set prefetch.

Note that in many case the --endpos LSN position is not known at the start of this command. Also before entering the prefetch and apply mode it is important to make sure that the initial base copy is finished.

Finally, it is also possible to setup the streaming replication options before using the pgcopydb follow command: see the pgcopydb stream setup and pgcopydb stream cleanup commands.

Replica Identity and lack of Primary Keys

Postgres Logical Decoding works with replaying changes using SQL statements, and for that exposes the concept of Replica Identity as described in the documentation for the ALTER TABLE … REPLICA IDENTITY command.

To quote Postgres docs: This form changes the information which is written to the write-ahead log to identify rows which are updated or deleted. In most cases, the old value of each column is only logged if it differs from the new value; however, if the old value is stored externally, it is always logged regardless of whether it changed. This option has no effect except when logical replication is in use.

To support Change Data Capture with Postgres Logical Decoding for tables that do not have a Primary Key, then it is necessary to use the ALTER TABLE ... REPLICA IDENTITY command for those tables.

In practice the two following options are to be considered:

  • REPLICA IDENTITY USING INDEX index_name

    This form is prefered when a UNIQUE index exists for the table without a primary key. The index must be unique, not partial, not deferrable, and include only columns marked NOT NULL.

  • REPLICA IDENTITY FULL

    When this is used on a table, then the WAL records contain the old values of all columns in the row.

Logical Decoding Pre-Fetching

When using pgcopydb clone --follow a logical replication slot is created on the source database before the initial COPY, using the same Postgres snapshot. This ensures data consistency.

Within the pgcopydb clone --follow approach, it is only possible to start applying the changes from the source database after the initial COPY has finished on the target database.

Also, from the Postgres documentation we read that Postgres replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys.

Accumulating WAL segments on the primary during the whole duration of the initial COPY involves capacity hazards, which translate into potential File System is Full errors on the WAL disk of the source database. It is crucial to avoid such a situation.

This is why pgcopydb implements CDC pre-fetching. In parallel to the initial COPY the command pgcopydb clone --follow pre-fetches the changes in local JSON and SQL files. Those files are placed in the XDG_DATA_HOME location, which could be a mount point for an infinite Blob Storage area.

The pgcopydb follow command is a convenience command that’s available as a logical decoding client, and it shares the same implementation as the pgcopydb clone --follow command. As a result, the pre-fetching strategy is also relevant to the pgcopydb follow command.

The sentinel table, or the Remote Control

To track progress and allow resuming of operations, pgcopydb uses a sentinel table. The sentinel table consists of a single row with the following fields:

$ pgcopydb stream sentinel get
startpos   1/8D173AF8
endpos     0/0
apply      disabled
write_lsn  0/0
flush_lsn  0/0
replay_lsn 0/0

Note that you can use the command pgcopydb stream sentinel get --json to fetch a JSON formatted output, such as the following:

{
  "startpos": "1/8D173AF8",
  "endpos": "1/8D173AF8",
  "apply": false,
  "write_lsn": "0/0",
  "flush_lsn": "0/0",
  "replay_lsn": "0/0"
}

The first three fields (startpos, endpos, apply) are specific to pgcopydb, then the following three fields (write_lsn, flush_lsn, replay_lsn) follow the Postgres replication protocol as visible in the docs for the pg_stat_replication function.

  • startpos

    The startpos field is the current LSN on the source database at the time when the Change Data Capture is setup in pgcopydb, such as when using the pgcopydb stream setup command.

    Note that both the pgcopydb follow and the pgcopydb clone --follow command implement the setup parts if the pgcopydb stream setup has not been used already.

  • endpos

    The endpos field is last LSN position from the source database that pgcopydb replays. The command pgcopydb follow (or pgcopydb clone --follow) stops when reaching beyond this LSN position.

    The endpos can be set at the start of the process, which is useful for unit testing, or while the command is running, which is useful in production to define a cutover point.

    To define the endpos while the command is running, use pgcopydb stream sentinel set endpos.

  • apply

    The apply field is a boolean (enabled/disabled) that control the catchup process. The pgcopydb catchup process replays the changes only when the apply boolean is set to true.

    The pgcopydb clone --follow command automatically enables the apply field of the sentinel table as soon as the initial COPY is done.

    To manually control the apply field, use the pgcopydb stream sentinel set apply command.

  • write_lsn

    The Postgres documentation for pg_stat_replication.write_lsn is: Last write-ahead log location written to disk by this standby server.

    In the pgcopydb case, the sentinel field write_lsn is the position that has been written to disk (as JSON) by the streaming process.

  • flush_lsn

    The Postgres documentation for pg_stat_replication.flush_lsn is: Last write-ahead log location flushed to disk by this standby server

    In the pgcopydb case, the sentinel field flush_lsn is the position that has been written and then fsync’ed to disk (as JSON) by the streaming process.

  • replay_lsn

    The Postgres documentation for pg_stat_replication.replay_lsn is: Last write-ahead log location replayed into the database on this standby server

    In the pgcopydb case, the sentinel field replay_lsn is the position that has been applied to the target database, as kept track from the WAL.json and then the WAL.sql files, and using the Postgres API for Replication Progress Tracking.

    The replay_lsn is also shared by the pgcopydb streaming process that uses the Postgres logical replication protocol, so the pg_stat_replication entry associated with the replication slot used by pgcopydb can be used to monitor replication lag.

As the pgcopydb streaming processes maintain the sentinel table on the source database, it is also possible to use it to keep track of the logical replication progress.

Options

The following options are available to pgcopydb follow:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

--restart

When running the pgcopydb command again, if the work directory already contains information from a previous run, then the command refuses to proceed and delete information that might be used for diagnostics and forensics.

In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.

--resume

When the pgcopydb command was terminated before completion, either by an interrupt signal (such as C-c or SIGTERM) or because it crashed, it is possible to resume the database migration.

When resuming activity from a previous run, table data that was fully copied over to the target server is not sent again. Table data that was interrupted during the COPY has to be started from scratch even when using --resume: the COPY command in Postgres is transactional and was rolled back.

Same reasonning applies to the CREATE INDEX commands and ALTER TABLE commands that pgcopydb issues, those commands are skipped on a --resume run only if known to have run through to completion on the previous one.

Finally, using --resume requires the use of --not-consistent.

--not-consistent

In order to be consistent, pgcopydb exports a Postgres snapshot by calling the pg_export_snapshot() function on the source database server. The snapshot is then re-used in all the connections to the source database server by using the SET TRANSACTION SNAPSHOT command.

Per the Postgres documentation about pg_export_snapshot: Saves the transaction’s current snapshot and returns a text string identifying the snapshot. This string must be passed (outside the database) to clients that want to import the snapshot. The snapshot is available for import only until the end of the transaction that exported it.

Now, when the pgcopydb process was interrupted (or crashed) on a previous run, it is possible to resume operations, but the snapshot that was exported does not exists anymore. The pgcopydb command can only resume operations with a new snapshot, and thus can not ensure consistency of the whole data set, because each run is now using their own snapshot.

--snapshot

Instead of exporting its own snapshot by calling the Tantor SE-1C function pg_export_snapshot() it is possible for pgcopydb to re-use an already exported snapshot.

--plugin

Logical decoding output plugin to use. The default is test_decoding which ships with Postgres core itself, so is probably already available on your source server.

It is possible to use wal2json instead. The support for wal2json is mostly historical in pgcopydb, it should not make a user visible difference whether you use the default test_decoding or wal2json.

--wal2json-numeric-as-string

When using the wal2json output plugin, it is possible to use the --wal2json-numeric-as-string option to instruct wal2json to output numeric values as strings and thus prevent some precision loss.

You need to have a wal2json plugin version on source database that supports --numeric-data-types-as-string option to use this option.

See also the documentation for wal2json regarding this option for details.

--slot-name

Logical decoding slot name to use. Defaults to pgcopydb. which is unfortunate when your use-case involves migrating more than one database from the source server.

--create-slot

Instruct pgcopydb to create the logical replication slot to use.

--endpos

Logical decoding target LSN to use. Automatically stop replication and exit with normal exit status 0 when receiving reaches the specified LSN. If there’s a record with LSN exactly equal to lsn, the record will be output.

The --endpos option is not aware of transaction boundaries and may truncate output partway through a transaction. Any partially output transaction will not be consumed and will be replayed again when the slot is next read from. Individual messages are never truncated.

See also documentation for pg_recvlogical.

--origin

Logical replication target system needs to track the transactions that have been applied already, so that in case we get disconnected or need to resume operations we can skip already replayed transaction.

Postgres uses a notion of an origin node name as documented in Replication Progress Tracking. This option allows to pick your own node name and defaults to “pgcopydb”. Picking a different name is useful in some advanced scenarios like migrating several sources in the same target, where each source should have their own unique origin node name.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TARGET_PGURI

Connection string to the target Postgres instance. When --target is ommitted from the command line, then this environment variable is used.

PGCOPYDB_OUTPUT_PLUGIN

Logical decoding output plugin to use. When --plugin is omitted from the command line, then this environment variable is used.

PGCOPYDB_WAL2JSON_NUMERIC_AS_STRING

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the wal2json option --numeric-data-types-as-string when using the wal2json output plugin.

When --wal2json-numeric-as-string is ommitted from the command line then this environment variable is used.

PGCOPYDB_SNAPSHOT

Postgres snapshot identifier to re-use, see also --snapshot.

TMPDIR

The pgcopydb command creates all its work files and directories in ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.

XDG_DATA_HOME

The standard XDG Base Directory Specification defines several environment variables that allow controling where programs should store their files.

XDG_DATA_HOME defines the base directory relative to which user-specific data files should be stored. If $XDG_DATA_HOME is either not set or empty, a default equal to $HOME/.local/share should be used.

When using Change Data Capture (through --follow option and Postgres logical decoding) then pgcopydb pre-fetches changes in JSON files and transform them into SQL files to apply to the target database.

These files are stored at the following location, tried in this order:

  1. when --dir is used, then pgcopydb uses the cdc subdirectory of the --dir location,

  2. when XDG_DATA_HOME is set in the environment, then pgcopydb uses that location,

  3. when neither of the previous settings have been used then pgcopydb defaults to using ${HOME}/.local/share.

pgcopydb snapshot

pgcopydb snapshot - Create and export a snapshot on the source database

The command pgcopydb snapshot connects to the source database and executes a SQL query to export a snapshot. The obtained snapshot is both printed on stdout and also in a file where other pgcopydb commands might expect to find it.

pgcopydb snapshot: Create and export a snapshot on the source database
usage: pgcopydb snapshot  --source ...

  --source                      Postgres URI to the source database
  --dir                         Work directory to use
  --follow                      Implement logical decoding to replay changes
  --plugin                      Output plugin to use (test_decoding, wal2json)
  --wal2json-numeric-as-string  Print numeric data type as string when using wal2json output plugin
  --slot-name                   Use this Postgres replication slot name

Options

The following options are available to pgcopydb snapshot:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

--follow

When the --follow option is used then pgcopydb implements Change Data Capture as detailed in the manual page for pgcopydb follow in parallel to the main copy database steps.

The replication slot is created using the Postgres replication protocol command CREATE_REPLICATION_SLOT, which then exports the snapshot being used in that command.

--plugin

Logical decoding output plugin to use. The default is test_decoding which ships with Postgres core itself, so is probably already available on your source server.

It is possible to use wal2json instead. The support for wal2json is mostly historical in pgcopydb, it should not make a user visible difference whether you use the default test_decoding or wal2json.

--wal2json-numeric-as-string

When using the wal2json output plugin, it is possible to use the --wal2json-numeric-as-string option to instruct wal2json to output numeric values as strings and thus prevent some precision loss.

You need to have a wal2json plugin version on source database that supports --numeric-data-types-as-string option to use this option.

See also the documentation for wal2json regarding this option for details.

--slot-name

Logical decoding slot name to use.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

PGCOPYDB_OUTPUT_PLUGIN

Logical decoding output plugin to use. When --plugin is omitted from the command line, then this environment variable is used.

PGCOPYDB_WAL2JSON_NUMERIC_AS_STRING

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the wal2json option --numeric-data-types-as-string when using the wal2json output plugin.

When --wal2json-numeric-as-string is ommitted from the command line then this environment variable is used.

Examples

Create a snapshot on the source database in the background:

$ pgcopydb snapshot &
[1] 72938
17:31:52 72938 INFO  Running pgcopydb version 0.7.13.gcbf2d16.dirty from "/Users/dim/dev/PostgreSQL/pgcopydb/./src/bin/pgcopydb/pgcopydb"
17:31:52 72938 INFO  Using work dir "/var/folders/d7/zzxmgs9s16gdxxcm0hs0sssw0000gn/T//pgcopydb"
17:31:52 72938 INFO  Removing the stale pid file "/var/folders/d7/zzxmgs9s16gdxxcm0hs0sssw0000gn/T//pgcopydb/pgcopydb.aux.pid"
17:31:52 72938 INFO  Work directory "/var/folders/d7/zzxmgs9s16gdxxcm0hs0sssw0000gn/T//pgcopydb" already exists
17:31:52 72938 INFO  Exported snapshot "00000003-000CB5FE-1" from the source database
00000003-000CB5FE-1

And when the process is done, stop maintaining the snapshot in the background:

$ kill %1
17:31:56 72938 INFO  Asked to terminate, aborting
[1]+  Done                    pgcopydb snapshot

pgcopydb compare

pgcopydb compare - Compare source and target databases

The command pgcopydb compare connects to the source and target databases and executes SQL queries to get Postgres catalog information about the table, indexes and sequences that are migrated.

The tool then compares either the schema definitions or the data contents of the selected tables, and report success by means of an Unix return code of zero.

At the moment, the pgcopydb compare tool is pretty limited in terms of schema support: it only covers what pgcopydb needs to know about the database schema, which isn’t much.

pgcopydb compare: Compare source and target databases

Available commands:
  pgcopydb compare
    schema  Compare source and target schema
    data    Compare source and target data

pgcopydb compare schema

pgcopydb compare schema - Compare source and target schema

The command pgcopydb compare schema connects to the source and target databases and executes SQL queries using the Postgres catalogs to get a list of tables, indexes, constraints and sequences there.

pgcopydb compare schema: Compare source and target schema
usage: pgcopydb compare schema  --source ...

  --source         Postgres URI to the source database
  --target         Postgres URI to the target database
  --dir            Work directory to use

pgcopydb compare data

pgcopydb compare data - Compare source and target data

The command pgcopydb compare data connects to the source and target databases and executes SQL queries using the Postgres catalogs to get a list of tables, indexes, constraints and sequences there.

Then it uses a SQL query with the following template to compute the row count and a checksum for each table:

/*
 * Compute the hashtext of every single row in the table, and aggregate the
 * results as a sum of bigint numbers. Because the sum of bigint could
 * overflow to numeric, the aggregated sum is then hashed into an MD5
 * value: bigint is 64 bits, MD5 is 128 bits.
 *
 * Also, to lower the chances of a collision, include the row count in the
 * computation of the MD5 by appending it to the input string of the MD5
 * function.
 */
select count(1) as cnt,
       md5(
         format(
           '%%s-%%s',
           sum(hashtext(__COLS__::text)::bigint),
           count(1)
         )
       )::uuid as chksum
from only __TABLE__

Running such a query on a large table can take a lot of time.

pgcopydb compare data: Compare source and target data
usage: pgcopydb compare data  --source ...

  --source         Postgres URI to the source database
  --target         Postgres URI to the target database
  --dir            Work directory to use
  --json           Format the output using JSON

Options

The following options are available to pgcopydb compare schema and pgcopydb compare data subcommands:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

--json

The output of the command is formatted in JSON, when supported. Ignored otherwise.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TARGET_PGURI

Connection string to the target Postgres instance. When --target is ommitted from the command line, then this environment variable is used.

Examples

Comparing pgcopydb limited understanding of the schema:

$ pgcopydb compare schema --notice
INFO   Running pgcopydb version 0.12.28.g34343c8.dirty from "/Users/dim/dev/PostgreSQL/pgcopydb/src/bin/pgcopydb/pgcopydb"
NOTICE Using work dir "/var/folders/d7/zzxmgs9s16gdxxcm0hs0sssw0000gn/T//pgcopydb"
NOTICE Work directory "/var/folders/d7/zzxmgs9s16gdxxcm0hs0sssw0000gn/T//pgcopydb" already exists
INFO   A previous run has run through completion
INFO   SOURCE: Connecting to "postgres:///pagila"
INFO   Fetched information for 1 extensions
INFO   Fetched information for 25 tables, with an estimated total of 5179  tuples and 190 MB
INFO   Fetched information for 49 indexes
INFO   Fetching information for 16 sequences
NOTICE Skipping target catalog preparation
NOTICE Storing migration schema in JSON file "/var/folders/d7/zzxmgs9s16gdxxcm0hs0sssw0000gn/T//pgcopydb/compare/source-schema.json"
INFO   TARGET: Connecting to "postgres:///plop"
INFO   Fetched information for 6 extensions
INFO   Fetched information for 25 tables, with an estimated total of 5219  tuples and 190 MB
INFO   Fetched information for 49 indexes
INFO   Fetching information for 16 sequences
NOTICE Skipping target catalog preparation
NOTICE Storing migration schema in JSON file "/var/folders/d7/zzxmgs9s16gdxxcm0hs0sssw0000gn/T//pgcopydb/compare/target-schema.json"
INFO   [SOURCE] table: 25 index: 49 sequence: 16
INFO   [TARGET] table: 25 index: 49 sequence: 16
NOTICE Matched table "public"."test": 1 columns ok, 0 indexes ok
NOTICE Matched table "public"."rental": 7 columns ok, 3 indexes ok
NOTICE Matched table "public"."film": 14 columns ok, 5 indexes ok
NOTICE Matched table "public"."film_actor": 3 columns ok, 2 indexes ok
NOTICE Matched table "public"."inventory": 4 columns ok, 2 indexes ok
NOTICE Matched table "public"."payment_p2022_03": 6 columns ok, 3 indexes ok
NOTICE Matched table "public"."payment_p2022_05": 6 columns ok, 3 indexes ok
NOTICE Matched table "public"."payment_p2022_06": 6 columns ok, 3 indexes ok
NOTICE Matched table "public"."payment_p2022_04": 6 columns ok, 3 indexes ok
NOTICE Matched table "public"."payment_p2022_02": 6 columns ok, 3 indexes ok
NOTICE Matched table "public"."payment_p2022_07": 6 columns ok, 0 indexes ok
NOTICE Matched table "public"."customer": 10 columns ok, 4 indexes ok
NOTICE Matched table "public"."address": 8 columns ok, 2 indexes ok
NOTICE Matched table "public"."city": 4 columns ok, 2 indexes ok
NOTICE Matched table "public"."film_category": 3 columns ok, 1 indexes ok
NOTICE Matched table "public"."payment_p2022_01": 6 columns ok, 3 indexes ok
NOTICE Matched table "public"."actor": 4 columns ok, 2 indexes ok
NOTICE Matched table "public"."bar": 2 columns ok, 1 indexes ok
NOTICE Matched table "public"."bin": 2 columns ok, 0 indexes ok
NOTICE Matched table "public"."category": 3 columns ok, 1 indexes ok
NOTICE Matched table "public"."country": 3 columns ok, 1 indexes ok
NOTICE Matched table "public"."foo": 2 columns ok, 1 indexes ok
NOTICE Matched table "public"."staff": 11 columns ok, 1 indexes ok
NOTICE Matched table "public"."language": 3 columns ok, 1 indexes ok
NOTICE Matched table "public"."store": 4 columns ok, 2 indexes ok
NOTICE Matched sequence "public"."actor_actor_id_seq" (last value 200)
NOTICE Matched sequence "public"."address_address_id_seq" (last value 605)
NOTICE Matched sequence "public"."bar_id_seq" (last value 1)
NOTICE Matched sequence "public"."bin_id_seq" (last value 17)
NOTICE Matched sequence "public"."category_category_id_seq" (last value 16)
NOTICE Matched sequence "public"."city_city_id_seq" (last value 600)
NOTICE Matched sequence "public"."country_country_id_seq" (last value 109)
NOTICE Matched sequence "public"."customer_customer_id_seq" (last value 599)
NOTICE Matched sequence "public"."film_film_id_seq" (last value 1000)
NOTICE Matched sequence "public"."foo_id_seq" (last value 1)
NOTICE Matched sequence "public"."inventory_inventory_id_seq" (last value 4581)
NOTICE Matched sequence "public"."language_language_id_seq" (last value 6)
NOTICE Matched sequence "public"."payment_payment_id_seq" (last value 32102)
NOTICE Matched sequence "public"."rental_rental_id_seq" (last value 16053)
NOTICE Matched sequence "public"."staff_staff_id_seq" (last value 2)
NOTICE Matched sequence "public"."store_store_id_seq" (last value 2)
INFO   pgcopydb schema inspection is successful

Comparing data:

$ pgcopydb compare data
INFO   A previous run has run through completion
INFO   SOURCE: Connecting to "postgres:///pagila"
INFO   Fetched information for 1 extensions
INFO   Fetched information for 25 tables, with an estimated total of 5179  tuples and 190 MB
INFO   Fetched information for 49 indexes
INFO   Fetching information for 16 sequences
INFO   TARGET: Connecting to "postgres:///plop"
INFO   Fetched information for 6 extensions
INFO   Fetched information for 25 tables, with an estimated total of 5219  tuples and 190 MB
INFO   Fetched information for 49 indexes
INFO   Fetching information for 16 sequences
INFO   Comparing data for 25 tables
ERROR  Table "public"."test" has 5173526 rows on source, 5173525 rows on target
ERROR  Table "public"."test" has checksum be66f291-2774-9365-400c-1ccd5160bdf on source, 8be89afa-bceb-f501-dc7b-0538dc17fa3 on target
ERROR  Table "public"."foo" has 3 rows on source, 2 rows on target
ERROR  Table "public"."foo" has checksum a244eba3-376b-75e6-6720-e853b485ef6 on source, 594ae64d-2216-f687-2f11-45cbd9c7153 on target
                    Table Name | ! |                      Source Checksum |                      Target Checksum
-------------------------------+---+--------------------------------------+-------------------------------------
               "public"."test" | ! |  be66f291-2774-9365-400c-1ccd5160bdf |  8be89afa-bceb-f501-dc7b-0538dc17fa3
             "public"."rental" |   |  e7dfabf3-baa8-473a-8fd3-76d59e56467 |  e7dfabf3-baa8-473a-8fd3-76d59e56467
               "public"."film" |   |  c5058d1e-aaf4-f058-6f1e-76d5db63da9 |  c5058d1e-aaf4-f058-6f1e-76d5db63da9
         "public"."film_actor" |   |  7534654a-0bcd-cb27-1a2e-ccd524899a9 |  7534654a-0bcd-cb27-1a2e-ccd524899a9
          "public"."inventory" |   |  72f9afd8-0064-3642-acd7-9ee1f444efe |  72f9afd8-0064-3642-acd7-9ee1f444efe
   "public"."payment_p2022_03" |   |  dc73311a-2ea2-e933-da80-123b44d06b9 |  dc73311a-2ea2-e933-da80-123b44d06b9
   "public"."payment_p2022_05" |   |  e788bf50-9809-9896-8110-91816edcc04 |  e788bf50-9809-9896-8110-91816edcc04
   "public"."payment_p2022_06" |   |  5f650b4c-d491-37ac-6d91-dc2ae484600 |  5f650b4c-d491-37ac-6d91-dc2ae484600
   "public"."payment_p2022_04" |   |  02beb400-1b82-c9ba-8fe9-690eca2e635 |  02beb400-1b82-c9ba-8fe9-690eca2e635
   "public"."payment_p2022_02" |   |  97154691-488e-9a36-9a4b-4da7b62dbc0 |  97154691-488e-9a36-9a4b-4da7b62dbc0
   "public"."payment_p2022_07" |   |  c6fdf7ef-4382-b301-41c3-1d190149dc5 |  c6fdf7ef-4382-b301-41c3-1d190149dc5
           "public"."customer" |   |  11973c6a-6df3-c502-5495-64f42e0386c |  11973c6a-6df3-c502-5495-64f42e0386c
            "public"."address" |   |  8c701dbf-c1ba-f386-a9ae-c3f6e478ba7 |  8c701dbf-c1ba-f386-a9ae-c3f6e478ba7
               "public"."city" |   |  f23ad758-f94a-a8fd-8c3f-25fedcadb06 |  f23ad758-f94a-a8fd-8c3f-25fedcadb06
      "public"."film_category" |   |  4b04cfee-e1bc-718d-d890-afdcd6729ce |  4b04cfee-e1bc-718d-d890-afdcd6729ce
   "public"."payment_p2022_01" |   |  fde341ed-0f3f-23bd-dedd-4e92c5a8e55 |  fde341ed-0f3f-23bd-dedd-4e92c5a8e55
              "public"."actor" |   |  b5ea389d-140f-10b4-07b9-a80d634d86b |  b5ea389d-140f-10b4-07b9-a80d634d86b
                "public"."bar" |   |  a7cae1c8-ed66-63ba-1b93-7ba7570ef63 |  a7cae1c8-ed66-63ba-1b93-7ba7570ef63
                "public"."bin" |   |  6832546a-333b-3bdb-fdf2-325cc7a028a |  6832546a-333b-3bdb-fdf2-325cc7a028a
           "public"."category" |   |  082f9cf9-92ab-6d6c-c74a-feb577611cc |  082f9cf9-92ab-6d6c-c74a-feb577611cc
            "public"."country" |   |  a3a0dd4f-68e0-4ca5-33d2-05c9fd60c34 |  a3a0dd4f-68e0-4ca5-33d2-05c9fd60c34
                "public"."foo" | ! |  a244eba3-376b-75e6-6720-e853b485ef6 |  594ae64d-2216-f687-2f11-45cbd9c7153
              "public"."staff" |   |  3eb5f007-7160-81ba-5aa5-973de3f5c3d |  3eb5f007-7160-81ba-5aa5-973de3f5c3d
           "public"."language" |   |  58aa8132-11ae-f3bc-fa82-c773bba2032 |  58aa8132-11ae-f3bc-fa82-c773bba2032
              "public"."store" |   |  d8477e63-0661-90a4-03fa-fcc26a95865 |  d8477e63-0661-90a4-03fa-fcc26a95865

pgcopydb copy

pgcopydb copy - Implement the data section of the database copy

This command prefixes the following sub-commands:

pgcopydb copy: Implement the data section of the database copy

Available commands:
  pgcopydb copy
    db           Copy an entire database from source to target
    roles        Copy the roles from the source instance to the target instance
    extensions   Copy the extensions from the source instance to the target instance
    schema       Copy the database schema from source to target
    data         Copy the data section from source to target
    table-data   Copy the data from all tables in database from source to target
    blobs        Copy the blob data from the source database to the target
    sequences    Copy the current value from all sequences in database from source to target
    indexes      Create all the indexes found in the source database in the target
    constraints  Create all the constraints found in the source database in the target

Those commands implement a part of the whole database copy operation as detailed in section pgcopydb clone. Only use those commands to debug a specific part, or because you know that you just want to implement that step.

Warning

Using the pgcopydb clone command is strongly advised.

This mode of operations is useful for debugging and advanced use cases only.

pgcopydb copy db

pgcopydb copy db - Copy an entire database from source to target

The command pgcopydb copy db is an alias for pgcopydb clone. See also pgcopydb clone.

pgcopydb copy db: Copy an entire database from source to target
usage: pgcopydb copy db  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database
  --dir                 Work directory to use
  --table-jobs          Number of concurrent COPY jobs to run
  --index-jobs          Number of concurrent CREATE INDEX jobs to run
  --restore-jobs        Number of concurrent jobs for pg_restore
  --drop-if-exists      On the target database, clean-up from a previous run first
  --roles               Also copy roles found on source to target
  --no-owner            Do not set ownership of objects to match the original database
  --no-acl              Prevent restoration of access privileges (grant/revoke commands).
  --no-comments         Do not output commands to restore comments
  --no-tablespaces      Do not output commands to select tablespaces
  --skip-large-objects  Skip copying large objects (blobs)
  --filters <filename>  Use the filters defined in <filename>
  --fail-fast           Abort early in case of error
  --restart             Allow restarting when temp files exist already
  --resume              Allow resuming operations after a failure
  --not-consistent      Allow taking a new snapshot on the source database
  --snapshot            Use snapshot obtained with pg_export_snapshot
  --use-copy-binary     Use the COPY BINARY format for COPY operations

pgcopydb copy roles

pgcopydb copy roles - Copy the roles from the source instance to the target instance

The command pgcopydb copy roles implements both pgcopydb dump roles and then pgcopydb restore roles.

pgcopydb copy roles: Copy the roles from the source instance to the target instance
usage: pgcopydb copy roles  --source ... --target ...

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database
  --dir                 Work directory to use
  --no-role-passwords   Do not dump passwords for roles

Note

In Postgres, roles are a global object. This means roles do not belong to any specific database, and as a result, even when the pgcopydb tool otherwise works only in the context of a specific database, this command is not limited to roles that are used within a single database.

When a role already exists on the target database, its restoring is entirely skipped, which includes skipping both the CREATE ROLE and the ALTER ROLE commands produced by pg_dumpall --roles-only.

The pg_dumpall --roles-only is used to fetch the list of roles from the source database, and this command includes support for passwords. As a result, this operation requires the superuser privileges.

pgcopydb copy extensions

pgcopydb copy extensions - Copy the extensions from the source instance to the target instance

The command pgcopydb copy extensions gets a list of the extensions installed on the source database, and for each of them run the SQL command CREATE EXTENSION IF NOT EXISTS.

pgcopydb copy extensions: Copy the extensions from the source instance to the target instance
usage: pgcopydb copy extensions  --source ... --target ...

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database
  --dir                 Work directory to use
  --requirements        List extensions requirements

When copying extensions, this command also takes care of copying any Extension Configuration Tables user-data to the target database.

pgcopydb copy schema

pgcopydb copy schema - Copy the database schema from source to target

The command pgcopydb copy schema implements the schema only section of the clone steps.

pgcopydb copy schema: Copy the database schema from source to target
usage: pgcopydb copy schema  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database
  --dir                 Work directory to use
  --filters <filename>  Use the filters defined in <filename>
  --restart             Allow restarting when temp files exist already
  --resume              Allow resuming operations after a failure
  --not-consistent      Allow taking a new snapshot on the source database
  --snapshot            Use snapshot obtained with pg_export_snapshot

pgcopydb copy data

pgcopydb copy data - Copy the data section from source to target

The command pgcopydb copy data implements the data section of the clone steps.

pgcopydb copy data: Copy the data section from source to target
usage: pgcopydb copy data  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source              Postgres URI to the source database
  --target              Postgres URI to the target database
  --dir                 Work directory to use
  --table-jobs          Number of concurrent COPY jobs to run
  --index-jobs          Number of concurrent CREATE INDEX jobs to run
  --restore-jobs        Number of concurrent jobs for pg_restore
  --skip-large-objects  Skip copying large objects (blobs)
  --filters <filename>  Use the filters defined in <filename>
  --restart             Allow restarting when temp files exist already
  --resume              Allow resuming operations after a failure
  --not-consistent      Allow taking a new snapshot on the source database
  --snapshot            Use snapshot obtained with pg_export_snapshot

Note

The current command line has both the commands pgcopydb copy table-data and pgcopydb copy data, which are looking quite similar but implement different steps. Be careful for now. This will change later.

The pgcopydb copy data command implements the following steps:

$ pgcopydb copy table-data
$ pgcopydb copy blobs
$ pgcopydb copy indexes
$ pgcopydb copy constraints
$ pgcopydb copy sequences
$ vacuumdb -z

Those steps are actually done concurrently to one another when that’s possible, in the same way as the main command pgcopydb clone would. The only difference is that the pgcopydb clone command also prepares and finishes the schema parts of the operations (pre-data, then post-data), which the pgcopydb copy data command ignores.

pgcopydb copy table-data

pgcopydb copy table-data - Copy the data from all tables in database from source to target

The command pgcopydb copy table-data fetches the list of tables from the source database and runs a COPY TO command on the source database and sends the result to the target database using a COPY FROM command directly, avoiding disks entirely.

pgcopydb copy table-data: Copy the data from all tables in database from source to target
usage: pgcopydb copy table-data  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --table-jobs         Number of concurrent COPY jobs to run
  --filters <filename> Use the filters defined in <filename>
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database
  --snapshot           Use snapshot obtained with pg_export_snapshot

pgcopydb copy blobs

pgcopydb copy blobs - Copy the blob data from the source database to the target

The command pgcopydb copy blobs fetches list of large objects (aka blobs) from the source database and copies their data parts to the target database. By default the command assumes that the large objects metadata have already been taken care of, because of the behaviour of pg_dump --section=pre-data.

pgcopydb copy blobs: Copy the blob data from the source database to the target
usage: pgcopydb copy blobs  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --large-objects-jobs Number of concurrent Large Objects jobs to run
  --drop-if-exists     On the target database, drop and create large objects
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database
  --snapshot           Use snapshot obtained with pg_export_snapshot

pgcopydb copy sequences

pgcopydb copy sequences - Copy the current value from all sequences in database from source to target

The command pgcopydb copy sequences fetches the list of sequences from the source database, then for each sequence fetches the last_value and is_called properties the same way pg_dump would on the source database, and then for each sequence call pg_catalog.setval() on the target database.

pgcopydb copy sequences: Copy the current value from all sequences in database from source to target
usage: pgcopydb copy sequences  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --filters <filename> Use the filters defined in <filename>
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database
  --snapshot           Use snapshot obtained with pg_export_snapshot

pgcopydb copy indexes

pgcopydb copy indexes - Create all the indexes found in the source database in the target

The command pgcopydb copy indexes fetches the list of indexes from the source database and runs each index CREATE INDEX statement on the target database. The statements for the index definitions are modified to include IF NOT EXISTS and allow for skipping indexes that already exist on the target database.

pgcopydb copy indexes: Create all the indexes found in the source database in the target
usage: pgcopydb copy indexes  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --index-jobs         Number of concurrent CREATE INDEX jobs to run
  --restore-jobs       Number of concurrent jobs for pg_restore
  --filters <filename> Use the filters defined in <filename>
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database

pgcopydb copy constraints

pgcopydb copy constraints - Create all the constraints found in the source database in the target

The command pgcopydb copy constraints fetches the list of indexes from the source database and runs each index ALTER TABLE … ADD CONSTRAINT … USING INDEX statement on the target database.

The indexes must already exist, and the command will fail if any constraint is found existing already on the target database.

pgcopydb copy constraints: Create all the constraints found in the source database in the target
usage: pgcopydb copy constraints  --source ... --target ... [ --table-jobs ... --index-jobs ... ]

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --filters <filename> Use the filters defined in <filename>
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database

Description

These commands allow implementing a specific step of the pgcopydb operations at a time. It’s useful mainly for debugging purposes, though some advanced and creative usage can be made from the commands.

The target schema is not created, so it needs to have been taken care of first. It is possible to use the commands pgcopydb dump schema and then pgcopydb restore pre-data to prepare your target database.

To implement the same operations as a pgcopydb clone command would, use the following recipe:

$ export PGCOPYDB_SOURCE_PGURI="postgres://user@source/dbname"
$ export PGCOPYDB_TARGET_PGURI="postgres://user@target/dbname"

$ pgcopydb dump schema
$ pgcopydb restore pre-data --resume --not-consistent
$ pgcopydb copy table-data --resume --not-consistent
$ pgcopydb copy sequences --resume --not-consistent
$ pgcopydb copy indexes --resume --not-consistent
$ pgcopydb copy constraints --resume --not-consistent
$ vacuumdb -z
$ pgcopydb restore post-data --resume --not-consistent

The main pgcopydb clone is still better at concurrency than doing those steps manually, as it will create the indexes for any given table as soon as the table-data section is finished, without having to wait until the last table-data has been copied over. Same applies to constraints, and then vacuum analyze.

Options

The following options are available to pgcopydb copy sub-commands:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

--no-role-passwords

Do not dump passwords for roles. When restored, roles will have a null password, and password authentication will always fail until the password is set. Since password values aren’t needed when this option is specified, the role information is read from the catalog view pg_roles instead of pg_authid. Therefore, this option also helps if access to pg_authid is restricted by some security policy.

--table-jobs

How many tables can be processed in parallel.

This limit only applies to the COPY operations, more sub-processes will be running at the same time that this limit while the CREATE INDEX operations are in progress, though then the processes are only waiting for the target Postgres instance to do all the work.

--index-jobs

How many indexes can be built in parallel, globally. A good option is to set this option to the count of CPU cores that are available on the Postgres target system, minus some cores that are going to be used for handling the COPY operations.

--large-object-jobs

How many worker processes to start to copy Large Objects concurrently.

--split-tables-larger-than

Allow Same-table Concurrency when processing the source database. This environment variable value is expected to be a byte size, and bytes units B, kB, MB, GB, TB, PB, and EB are known.

--skip-large-objects

Skip copying large objects, also known as blobs, when copying the data from the source database to the target database.

--restart

When running the pgcopydb command again, if the work directory already contains information from a previous run, then the command refuses to proceed and delete information that might be used for diagnostics and forensics.

In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.

--resume

When the pgcopydb command was terminated before completion, either by an interrupt signal (such as C-c or SIGTERM) or because it crashed, it is possible to resume the database migration.

When resuming activity from a previous run, table data that was fully copied over to the target server is not sent again. Table data that was interrupted during the COPY has to be started from scratch even when using --resume: the COPY command in Postgres is transactional and was rolled back.

Same reasonning applies to the CREATE INDEX commands and ALTER TABLE commands that pgcopydb issues, those commands are skipped on a --resume run only if known to have run through to completion on the previous one.

Finally, using --resume requires the use of --not-consistent.

--not-consistent

In order to be consistent, pgcopydb exports a Postgres snapshot by calling the pg_export_snapshot() function on the source database server. The snapshot is then re-used in all the connections to the source database server by using the SET TRANSACTION SNAPSHOT command.

Per the Postgres documentation about pg_export_snapshot: Saves the transaction’s current snapshot and returns a text string identifying the snapshot. This string must be passed (outside the database) to clients that want to import the snapshot. The snapshot is available for import only until the end of the transaction that exported it.

Now, when the pgcopydb process was interrupted (or crashed) on a previous run, it is possible to resume operations, but the snapshot that was exported does not exists anymore. The pgcopydb command can only resume operations with a new snapshot, and thus can not ensure consistency of the whole data set, because each run is now using their own snapshot.

--snapshot

Instead of exporting its own snapshot by calling the Tantor SE-1C function pg_export_snapshot() it is possible for pgcopydb to re-use an already exported snapshot.

--use-copy-binary

Use the COPY WITH (FORMAT BINARY) instead of the COPY command.

See also documentation for COPY.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TARGET_PGURI

Connection string to the target Postgres instance. When --target is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TABLE_JOBS

Number of concurrent jobs allowed to run COPY operations in parallel. When --table-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_INDEX_JOBS

Number of concurrent jobs allowed to run CREATE INDEX operations in parallel. When --index-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_RESTORE_JOBS

Number of concurrent jobs allowed to run pg_restore operations in parallel. When --restore-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_LARGE_OBJECTS_JOBS

Number of concurrent jobs allowed to copy Large Objects data in parallel. When --large-objects-jobs is ommitted from the command line, then this environment variable is used.

PGCOPYDB_SPLIT_TABLES_LARGER_THAN

Allow Same-table Concurrency when processing the source database. This environment variable value is expected to be a byte size, and bytes units B, kB, MB, GB, TB, PB, and EB are known.

When --split-tables-larger-than is ommitted from the command line, then this environment variable is used.

PGCOPYDB_SPLIT_MAX_PARTS

Limit the maximum number of parts when Same-table Concurrency is used. When --split-max-parts is ommitted from the command line, then this environment variable is used.

PGCOPYDB_ESTIMATE_TABLE_SIZES

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb estimates the size of tables to determine whether or not to split tables. This option is only useful when querying the relation sizes on source database is costly.

When --estimate-table-sizes is ommitted from the command line, then this environment variable is used.

PGCOPYDB_DROP_IF_EXISTS

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the pg_restore options --clean --if-exists when creating the schema on the target Postgres instance.

PGCOPYDB_SNAPSHOT

Postgres snapshot identifier to re-use, see also --snapshot.

PGCOPYDB_USE_COPY_BINARY

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the COPY WITH (FORMAT BINARY) instead of the COPY command, same as when using the --use-copy-binary option.

TMPDIR

The pgcopydb command creates all its work files and directories in ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.

Examples

Let’s export the Postgres databases connection strings to make it easy to re-use them all along:

$ export PGCOPYDB_SOURCE_PGURI=postgres://pagila:0wn3d@source/pagila
$ export PGCOPYDB_TARGET_PGURI=postgres://pagila:0wn3d@target/pagila

Now, first dump the schema:

$ pgcopydb dump schema
08:27:48.633 44371 INFO   Using work dir "/tmp/pgcopydb"
08:27:48.634 44371 INFO   Dumping database from "postgres://pagila:0wn3d@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
08:27:48.634 44371 INFO   Dumping database into directory "/tmp/pgcopydb"
08:27:48.634 44371 INFO   Using pg_dump for Postgres "16.2" at "/usr/bin/pg_dump"
08:27:48.971 44371 INFO   Fetched information for 5 tables (including 0 tables split in 0 partitions total), with an estimated total of 1000 thousands tuples and 128 MB on-disk
08:27:48.978 44371 INFO   Fetched information for 4 indexes (supporting 4 constraints)
08:27:48.983 44371 INFO   Fetching information for 1 sequences
08:27:48.996 44371 INFO   Fetched information for 1 extensions
08:27:49.072 44371 INFO   Found 5 indexes (supporting 5 constraints) in the target database
08:27:49.078 44371 INFO    /usr/bin/pg_dump -Fc --section=pre-data --section=post-data --file /tmp/pgcopydb/schema/schema.dump 'postgres://pagila:0wn3d@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'

Now restore the pre-data schema on the target database, cleaning up the already existing objects if any, which allows running this test scenario again and again. It might not be what you want to do in your production target instance though!

$ PGCOPYDB_DROP_IF_EXISTS=on pgcopydb restore pre-data --no-owner --resume --not-consistent
08:30:13.621 44597 INFO   Using work dir "/tmp/pgcopydb"
08:30:13.621 44597 INFO   Restoring database from existing files at "/tmp/pgcopydb"
08:30:13.706 44597 INFO   Found 5 indexes (supporting 5 constraints) in the target database
08:30:13.710 44597 INFO   Using pg_restore for Postgres "16.2" at "/usr/bin/pg_restore"
08:30:13.711 44597 INFO   [TARGET] Restoring database into "postgres://pagila:0wn3d@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
08:30:13.730 44597 INFO   Drop tables on the target database, per --drop-if-exists
08:30:13.774 44597 INFO    /usr/bin/pg_restore --dbname 'postgres://pagila:0wn3d@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --section pre-data --jobs 4 --clean --if-exists --no-owner --use-list /tmp/pgcopydb/schema/pre-filtered.list /tmp/pgcopydb/schema/schema.dump

Then copy the data over:

$ pgcopydb copy table-data --resume --not-consistent
08:34:02.813 44834 INFO   [SOURCE] Copying database from "postgres://pagila:0wn3d@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
08:34:02.813 44834 INFO   [TARGET] Copying database into "postgres://pagila:0wn3d@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
08:34:02.861 44834 INFO   Using work dir "/tmp/pgcopydb"
08:34:02.862 44834 INFO   Copy data from source to target in sub-processes
08:34:02.863 44834 INFO   Re-using catalog caches
08:34:02.863 44834 INFO   STEP 4: starting 4 table-data COPY processes

And now create the indexes on the target database, using the index definitions from the source database:

$ pgcopydb copy indexes --resume --not-consistent
14:28:53 47 INFO   Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb"
14:28:53 47 INFO   [SOURCE] Copying database from "postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
14:28:53 47 INFO   [TARGET] Copying database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
14:28:53 47 INFO   Schema dump for pre-data and post-data section have been done
14:28:53 47 INFO   Pre-data schema has been restored on the target instance
14:28:53 47 INFO   All the table data has been copied to the target instance
14:28:53 47 INFO   All the indexes have been copied to the target instance
14:28:53 47 INFO   Fetched information for 54 indexes
14:28:53 47 INFO   Creating 54 indexes in the target database using 4 processes

                                               Step   Connection    Duration    Transfer   Concurrency
 --------------------------------------------------   ----------  ----------  ----------  ------------
                                        Dump Schema       source         0ms                         1
   Catalog Queries (table ordering, filtering, etc)       source         0ms                         1
                                     Prepare Schema       target         0ms                         1
      COPY, INDEX, CONSTRAINTS, VACUUM (wall clock)         both         0ms                     4 + 8
                                  COPY (cumulative)         both         0ms         0 B             4
                         Large Objects (cumulative)         both                                     4
             CREATE INDEX, CONSTRAINTS (cumulative)       target         0ms                         4
                                    Finalize Schema       target         0ms                         1
 --------------------------------------------------   ----------  ----------  ----------  ------------
                          Total Wall Clock Duration         both       696ms                     4 + 8
 --------------------------------------------------   ----------  ----------  ----------  ------------

Now re-create the constraints (primary key, unique constraints) from the source database schema into the target database:

$ pgcopydb copy constraints --resume --not-consistent
14:28:54 53 INFO   Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb"
14:28:54 53 INFO   [SOURCE] Copying database from "postgres://pagila@source/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
14:28:54 53 INFO   [TARGET] Copying database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
14:28:54 53 INFO   Schema dump for pre-data and post-data section have been done
14:28:54 53 INFO   Pre-data schema has been restored on the target instance
14:28:54 53 INFO   All the table data has been copied to the target instance
14:28:54 53 INFO   All the indexes have been copied to the target instance
14:28:54 53 INFO   Create constraints
14:28:54 53 INFO   Fetched information for 54 indexes
14:28:54 53 INFO   Creating 54 indexes in the target database using 4 processes

                                               Step   Connection    Duration    Transfer   Concurrency
 --------------------------------------------------   ----------  ----------  ----------  ------------
                                        Dump Schema       source         0ms                         1
   Catalog Queries (table ordering, filtering, etc)       source         0ms                         1
                                     Prepare Schema       target         0ms                         1
      COPY, INDEX, CONSTRAINTS, VACUUM (wall clock)         both         0ms                     4 + 8
                                  COPY (cumulative)         both         0ms         0 B             4
                         Large Objects (cumulative)         both                                     4
             CREATE INDEX, CONSTRAINTS (cumulative)       target         0ms                         4
                                    Finalize Schema       target         0ms                         1
 --------------------------------------------------   ----------  ----------  ----------  ------------
                          Total Wall Clock Duration         both       283ms                     4 + 8
 --------------------------------------------------   ----------  ----------  ----------  ------------

The next step is a VACUUM ANALYZE on each table that’s been just filled-in with the data, and for that we can just use the vacuumdb command from Postgres:

$ vacuumdb --analyze --dbname "$PGCOPYDB_TARGET_PGURI" --jobs 4
vacuumdb: vacuuming database "pagila"

Finally we can restore the post-data section of the schema:

$ pgcopydb restore post-data --resume --not-consistent
14:28:54 60 INFO   Running pgcopydb version 0.13.38.g22e6544.dirty from "/usr/local/bin/pgcopydb"
14:28:54 60 INFO   Schema dump for pre-data and post-data section have been done
14:28:54 60 INFO   Pre-data schema has been restored on the target instance
14:28:54 60 INFO   All the table data has been copied to the target instance
14:28:54 60 INFO   All the indexes have been copied to the target instance
14:28:54 60 INFO   Restoring database from existing files at "/tmp/pgcopydb"
14:28:54 60 INFO   Using pg_restore for Postgres "16.1" at "/usr/bin/pg_restore"
14:28:54 60 INFO   [TARGET] Restoring database into "postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
14:28:55 60 INFO    /usr/bin/pg_restore --dbname 'postgres://pagila@target/pagila?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --single-transaction --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/schema.dump

pgcopydb dump

pgcopydb dump - Dump database objects from a Postgres instance

This command prefixes the following sub-commands:

pgcopydb dump: Dump database objects from a Postgres instance

Available commands:
  pgcopydb dump
    schema  Dump source database schema as custom files in work directory
    roles   Dump source database roles as custome file in work directory

pgcopydb dump schema

pgcopydb dump schema - Dump source database schema as custom files in target directory

The command pgcopydb dump schema uses pg_dump to export SQL schema definitions from the given source Postgres instance.

pgcopydb dump schema: Dump source database schema as custom files in work directory
usage: pgcopydb dump schema  --source <URI>

  --source             Postgres URI to the source database
  --target             Directory where to save the dump files
  --dir                Work directory to use
  --skip-extensions    Skip restoring extensions
  --filters <filename> Use the filters defined in <filename>
  --snapshot           Use snapshot obtained with pg_export_snapshot

pgcopydb dump roles

pgcopydb dump roles - Dump source database roles as custome file in work directory

The command pgcopydb dump roles uses pg_dumpall –roles-only to export SQL definitions of the roles found on the source Postgres instance.

pgcopydb dump roles: Dump source database roles as custome file in work directory
usage: pgcopydb dump roles  --source <URI>

  --source            Postgres URI to the source database
  --target            Directory where to save the dump files
  --dir               Work directory to use
  --no-role-passwords Do not dump passwords for roles

The pg_dumpall --roles-only is used to fetch the list of roles from the source database, and this command includes support for passwords. As a result, this operation requires the superuser privileges.

It is possible to use the option --no-role-passwords to operate without superuser privileges. In that case though, the passwords are not part of the dump and authentication might fail until passwords have been setup properly.

Description

The pgcopydb dump schema command implements the first step of the full database migration and fetches the schema definitions from the source database.

When the command runs, it calls pg_dump to get the pre-data schema and the post-data schema output in a Postgres custom file called schema.dump.

The output files are written to the schema sub-directory of the --target directory.

Options

The following options are available to pgcopydb dump schema subcommand:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

--no-role-passwords

Do not dump passwords for roles. When restored, roles will have a null password, and password authentication will always fail until the password is set. Since password values aren’t needed when this option is specified, the role information is read from the catalog view pg_roles instead of pg_authid. Therefore, this option also helps if access to pg_authid is restricted by some security policy.

--snapshot

Instead of exporting its own snapshot by calling the Tantor SE-1C function pg_export_snapshot() it is possible for pgcopydb to re-use an already exported snapshot.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

Examples

First, using pgcopydb dump schema

$ pgcopydb dump schema --source "port=5501 dbname=demo" --target /tmp/target
09:35:21 3926 INFO  Dumping database from "port=5501 dbname=demo"
09:35:21 3926 INFO  Dumping database into directory "/tmp/target"
09:35:21 3926 INFO  Found a stale pidfile at "/tmp/target/pgcopydb.pid"
09:35:21 3926 WARN  Removing the stale pid file "/tmp/target/pgcopydb.pid"
09:35:21 3926 INFO  Using pg_dump for Postgres "12.9" at "/Applications/Postgres.app/Contents/Versions/12/bin/pg_dump"
09:35:21 3926 INFO   /Applications/Postgres.app/Contents/Versions/12/bin/pg_dump -Fc --section pre-data --section post-data --file /tmp/target/schema/schema.dump 'port=5501 dbname=demo'

Once the previous command is finished, the pg_dump output file can be found in /tmp/target/schema and is named schema.dump. Additionally, other files and directories have been created.

$ find /tmp/target
/tmp/target
/tmp/target/pgcopydb.pid
/tmp/target/schema
/tmp/target/schema/schema.dump
/tmp/target/run
/tmp/target/run/tables
/tmp/target/run/indexes

pgcopydb restore

pgcopydb restore - Restore database objects into a Postgres instance

This command prefixes the following sub-commands:

pgcopydb restore: Restore database objects into a Postgres instance

Available commands:
  pgcopydb restore
    schema      Restore a database schema from custom files to target database
    pre-data    Restore a database pre-data schema from custom file to target database
    post-data   Restore a database post-data schema from custom file to target database
    roles       Restore database roles from SQL file to target database
    parse-list  Parse pg_restore --list output from custom file

pgcopydb restore schema

pgcopydb restore schema - Restore a database schema from custom files to target database

The command pgcopydb restore schema uses pg_restore to create the SQL schema definitions from the given pgcopydb dump schema export directory. This command is not compatible with using Postgres files directly, it must be fed with the directory output from the pgcopydb dump ... commands.

pgcopydb restore schema: Restore a database schema from custom files to target database
usage: pgcopydb restore schema  --dir <dir> [ --source <URI> ] --target <URI>

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --restore-jobs       Number of concurrent jobs for pg_restore
  --drop-if-exists     On the target database, clean-up from a previous run first
  --no-owner           Do not set ownership of objects to match the original database
  --no-acl             Prevent restoration of access privileges (grant/revoke commands).
  --no-comments        Do not output commands to restore comments
  --no-tablespaces     Do not output commands to select tablespaces
  --filters <filename> Use the filters defined in <filename>
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database

pgcopydb restore pre-data

pgcopydb restore pre-data - Restore a database pre-data schema from custom file to target database

The command pgcopydb restore pre-data uses pg_restore to create the SQL schema definitions from the given pgcopydb dump schema export directory. This command is not compatible with using Postgres files directly, it must be fed with the directory output from the pgcopydb dump ... commands.

pgcopydb restore pre-data: Restore a database pre-data schema from custom file to target database
usage: pgcopydb restore pre-data  --dir <dir> [ --source <URI> ] --target <URI>

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --restore-jobs       Number of concurrent jobs for pg_restore
  --drop-if-exists     On the target database, clean-up from a previous run first
  --no-owner           Do not set ownership of objects to match the original database
  --no-acl             Prevent restoration of access privileges (grant/revoke commands).
  --no-comments        Do not output commands to restore comments
  --no-tablespaces     Do not output commands to select tablespaces
  --skip-extensions    Skip restoring extensions
  --skip-ext-comments  Skip restoring COMMENT ON EXTENSION
  --filters <filename> Use the filters defined in <filename>
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database

pgcopydb restore post-data

pgcopydb restore post-data - Restore a database post-data schema from custom file to target database

The command pgcopydb restore post-data uses pg_restore to create the SQL schema definitions from the given pgcopydb dump schema export directory. This command is not compatible with using Postgres files directly, it must be fed with the directory output from the pgcopydb dump ... commands.

pgcopydb restore post-data: Restore a database post-data schema from custom file to target database
usage: pgcopydb restore post-data  --dir <dir> [ --source <URI> ] --target <URI>

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --restore-jobs       Number of concurrent jobs for pg_restore
  --no-owner           Do not set ownership of objects to match the original database
  --no-acl             Prevent restoration of access privileges (grant/revoke commands).
  --no-comments        Do not output commands to restore comments
  --no-tablespaces     Do not output commands to select tablespaces
  --skip-extensions    Skip restoring extensions
  --skip-ext-comments  Skip restoring COMMENT ON EXTENSION
  --filters <filename> Use the filters defined in <filename>
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database

pgcopydb restore roles

pgcopydb restore roles - Restore database roles from SQL file to target database

The command pgcopydb restore roles runs the commands from the SQL script obtained from the command pgcopydb dump roles. Roles that already exist on the target database are skipped.

The pg_dumpall command issues two lines per role, the first one is a CREATE ROLE SQL command, the second one is an ALTER ROLE SQL command. Both those lines are skipped when the role already exists on the target database.

pgcopydb restore roles: Restore database roles from SQL file to target database
usage: pgcopydb restore roles  --dir <dir> [ --source <URI> ] --target <URI>

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --restore-jobs       Number of concurrent jobs for pg_restore

pgcopydb restore parse-list

pgcopydb restore parse-list - Parse pg_restore –list output from custom file

The command pgcopydb restore parse-list outputs pg_restore to list the archive catalog of the custom file format file that has been exported for the post-data section.

When using the --filters option , then the source database connection is used to grab all the dependend objects that should also be filtered, and the output of the command shows those pg_restore catalog entries commented out.

A pg_restore archive catalog entry is commented out when its line starts with a semi-colon character (;).

pgcopydb restore parse-list: Parse pg_restore --list output from custom file
usage: pgcopydb restore parse-list  [ <pre.list> ]

  --source             Postgres URI to the source database
  --target             Postgres URI to the target database
  --dir                Work directory to use
  --filters <filename> Use the filters defined in <filename>
  --skip-extensions    Skip restoring extensions
  --skip-ext-comments  Skip restoring COMMENT ON EXTENSION
  --restart            Allow restarting when temp files exist already
  --resume             Allow resuming operations after a failure
  --not-consistent     Allow taking a new snapshot on the source database

Description

The pgcopydb restore schema command implements the creation of SQL objects in the target database, second and last steps of a full database migration.

When the command runs, it calls pg_restore on the files found at the expected location within the --target directory, which has typically been created with the pgcopydb dump schema command.

The pgcopydb restore pre-data and pgcopydb restore post-data are limiting their actions to the file with pre-data and post-data in the source directory..

Options

The following options are available to pgcopydb restore schema:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

--restore-jobs

How many threads or processes can be used during pg_restore. A good option is to set this option to the count of CPU cores that are available on the Postgres target system.

If this value is not set, we reuse the --index-jobs value. If that value is not set either, we use the the default value for --index-jobs.

--drop-if-exists

When restoring the schema on the target Postgres instance, pgcopydb actually uses pg_restore. When this options is specified, then the following pg_restore options are also used: --clean --if-exists.

This option is useful when the same command is run several times in a row, either to fix a previous mistake or for instance when used in a continuous integration system.

This option causes DROP TABLE and DROP INDEX and other DROP commands to be used. Make sure you understand what you’re doing here!

--no-owner

Do not output commands to set ownership of objects to match the original database. By default, pg_restore issues ALTER OWNER or SET SESSION AUTHORIZATION statements to set ownership of created schema elements. These statements will fail unless the initial connection to the database is made by a superuser (or the same user that owns all of the objects in the script). With --no-owner, any user name can be used for the initial connection, and this user will own all the created objects.

--filters <filename>

This option allows to exclude table and indexes from the copy operations. See Filtering for details about the expected file format and the filtering options available.

--skip-extensions

Skip copying extensions from the source database to the target database.

When used, schema that extensions depend-on are also skipped: it is expected that creating needed extensions on the target system is then the responsibility of another command (such as pgcopydb copy extensions), and schemas that extensions depend-on are part of that responsibility.

Because creating extensions require superuser, this allows a multi-steps approach where extensions are dealt with superuser privileges, and then the rest of the pgcopydb operations are done without superuser privileges.

--skip-ext-comments

Skip copying COMMENT ON EXTENSION commands. This is implicit when using –skip-extensions.

--restart

When running the pgcopydb command again, if the work directory already contains information from a previous run, then the command refuses to proceed and delete information that might be used for diagnostics and forensics.

In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.

--resume

When the pgcopydb command was terminated before completion, either by an interrupt signal (such as C-c or SIGTERM) or because it crashed, it is possible to resume the database migration.

When resuming activity from a previous run, table data that was fully copied over to the target server is not sent again. Table data that was interrupted during the COPY has to be started from scratch even when using --resume: the COPY command in Postgres is transactional and was rolled back.

Same reasonning applies to the CREATE INDEX commands and ALTER TABLE commands that pgcopydb issues, those commands are skipped on a --resume run only if known to have run through to completion on the previous one.

Finally, using --resume requires the use of --not-consistent.

--not-consistent

In order to be consistent, pgcopydb exports a Postgres snapshot by calling the pg_export_snapshot() function on the source database server. The snapshot is then re-used in all the connections to the source database server by using the SET TRANSACTION SNAPSHOT command.

Per the Postgres documentation about pg_export_snapshot: Saves the transaction’s current snapshot and returns a text string identifying the snapshot. This string must be passed (outside the database) to clients that want to import the snapshot. The snapshot is available for import only until the end of the transaction that exported it.

Now, when the pgcopydb process was interrupted (or crashed) on a previous run, it is possible to resume operations, but the snapshot that was exported does not exists anymore. The pgcopydb command can only resume operations with a new snapshot, and thus can not ensure consistency of the whole data set, because each run is now using their own snapshot.

--snapshot

Instead of exporting its own snapshot by calling the Tantor SE-1C function pg_export_snapshot() it is possible for pgcopydb to re-use an already exported snapshot.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_TARGET_PGURI

Connection string to the target Postgres instance. When --target is ommitted from the command line, then this environment variable is used.

PGCOPYDB_DROP_IF_EXISTS

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the pg_restore options --clean --if-exists when creating the schema on the target Postgres instance.

Examples

First, using pgcopydb restore schema

$ PGCOPYDB_DROP_IF_EXISTS=on pgcopydb restore schema --source /tmp/target/ --target "port=54314 dbname=demo"
07:45:10.626 39254 INFO   Using work dir "/tmp/pgcopydb"
07:45:10.626 39254 INFO   Restoring database from existing files at "/tmp/pgcopydb"
07:45:10.720 39254 INFO   Found 2 indexes (supporting 2 constraints) in the target database
07:45:10.723 39254 INFO   Using pg_restore for Postgres "16.2" at "/usr/bin/pg_restore"
07:45:10.723 39254 INFO   [TARGET] Restoring database into "postgres://[email protected]:5435/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
07:45:10.737 39254 INFO   Drop tables on the target database, per --drop-if-exists
07:45:10.750 39254 INFO    /usr/bin/pg_restore --dbname 'postgres://[email protected]:5435/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --section pre-data --jobs 4 --clean --if-exists --use-list /tmp/pgcopydb/schema/pre-filtered.list /tmp/pgcopydb/schema/schema.dump
07:45:10.803 39254 INFO    /usr/bin/pg_restore --dbname 'postgres://[email protected]:5435/demo?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --section post-data --jobs 4 --clean --if-exists --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/schema.dump

Then the pgcopydb restore pre-data and pgcopydb restore post-data would look the same with just a single call to pg_restore instead of the both of them.

Using pgcopydb restore parse-list it’s possible to review the filtering options and see how pg_restore catalog entries are being commented-out.

$ cat ./tests/filtering/include.ini
[include-only-table]
public.actor
public.category
public.film
public.film_actor
public.film_category
public.language
public.rental

[exclude-index]
public.idx_store_id_film_id

[exclude-table-data]
public.rental

$ pgcopydb restore parse-list --dir /tmp/pagila/pgcopydb --resume --not-consistent --filters ./tests/filtering/include.ini
11:41:22 75175 INFO  Running pgcopydb version 0.5.8.ge0d2038 from "/Users/dim/dev/PostgreSQL/pgcopydb/./src/bin/pgcopydb/pgcopydb"
11:41:22 75175 INFO  [SOURCE] Restoring database from "postgres://@:54311/pagila?"
11:41:22 75175 INFO  [TARGET] Restoring database into "postgres://@:54311/plop?"
11:41:22 75175 INFO  Using work dir "/tmp/pagila/pgcopydb"
11:41:22 75175 INFO  Removing the stale pid file "/tmp/pagila/pgcopydb/pgcopydb.pid"
11:41:22 75175 INFO  Work directory "/tmp/pagila/pgcopydb" already exists
11:41:22 75175 INFO  Schema dump for pre-data and post-data section have been done
11:41:22 75175 INFO  Restoring database from existing files at "/tmp/pagila/pgcopydb"
11:41:22 75175 INFO  Using pg_restore for Postgres "12.9" at "/Applications/Postgres.app/Contents/Versions/12/bin/pg_restore"
11:41:22 75175 INFO  Exported snapshot "00000003-0003209A-1" from the source database
3242; 2606 317973 CONSTRAINT public actor actor_pkey postgres
;3258; 2606 317975 CONSTRAINT public address address_pkey postgres
3245; 2606 317977 CONSTRAINT public category category_pkey postgres
;3261; 2606 317979 CONSTRAINT public city city_pkey postgres
;3264; 2606 317981 CONSTRAINT public country country_pkey postgres
;3237; 2606 317983 CONSTRAINT public customer customer_pkey postgres
3253; 2606 317985 CONSTRAINT public film_actor film_actor_pkey postgres
3256; 2606 317987 CONSTRAINT public film_category film_category_pkey postgres
3248; 2606 317989 CONSTRAINT public film film_pkey postgres
;3267; 2606 317991 CONSTRAINT public inventory inventory_pkey postgres
3269; 2606 317993 CONSTRAINT public language language_pkey postgres
3293; 2606 317995 CONSTRAINT public rental rental_pkey postgres
;3295; 2606 317997 CONSTRAINT public staff staff_pkey postgres
;3298; 2606 317999 CONSTRAINT public store store_pkey postgres
3246; 1259 318000 INDEX public film_fulltext_idx postgres
3243; 1259 318001 INDEX public idx_actor_last_name postgres
;3238; 1259 318002 INDEX public idx_fk_address_id postgres
;3259; 1259 318003 INDEX public idx_fk_city_id postgres
;3262; 1259 318004 INDEX public idx_fk_country_id postgres
;3270; 1259 318005 INDEX public idx_fk_customer_id postgres
3254; 1259 318006 INDEX public idx_fk_film_id postgres
3290; 1259 318007 INDEX public idx_fk_inventory_id postgres
3249; 1259 318008 INDEX public idx_fk_language_id postgres
3250; 1259 318009 INDEX public idx_fk_original_language_id postgres
;3272; 1259 318010 INDEX public idx_fk_payment_p2020_01_customer_id postgres
;3271; 1259 318011 INDEX public idx_fk_staff_id postgres
;3273; 1259 318012 INDEX public idx_fk_payment_p2020_01_staff_id postgres
;3275; 1259 318013 INDEX public idx_fk_payment_p2020_02_customer_id postgres
;3276; 1259 318014 INDEX public idx_fk_payment_p2020_02_staff_id postgres
;3278; 1259 318015 INDEX public idx_fk_payment_p2020_03_customer_id postgres
;3279; 1259 318016 INDEX public idx_fk_payment_p2020_03_staff_id postgres
;3281; 1259 318017 INDEX public idx_fk_payment_p2020_04_customer_id postgres
;3282; 1259 318018 INDEX public idx_fk_payment_p2020_04_staff_id postgres
;3284; 1259 318019 INDEX public idx_fk_payment_p2020_05_customer_id postgres
;3285; 1259 318020 INDEX public idx_fk_payment_p2020_05_staff_id postgres
;3287; 1259 318021 INDEX public idx_fk_payment_p2020_06_customer_id postgres
;3288; 1259 318022 INDEX public idx_fk_payment_p2020_06_staff_id postgres
;3239; 1259 318023 INDEX public idx_fk_store_id postgres
;3240; 1259 318024 INDEX public idx_last_name postgres
;3265; 1259 318025 INDEX public idx_store_id_film_id postgres
3251; 1259 318026 INDEX public idx_title postgres
;3296; 1259 318027 INDEX public idx_unq_manager_staff_id postgres
3291; 1259 318028 INDEX public idx_unq_rental_rental_date_inventory_id_customer_id postgres
;3274; 1259 318029 INDEX public payment_p2020_01_customer_id_idx postgres
;3277; 1259 318030 INDEX public payment_p2020_02_customer_id_idx postgres
;3280; 1259 318031 INDEX public payment_p2020_03_customer_id_idx postgres
;3283; 1259 318032 INDEX public payment_p2020_04_customer_id_idx postgres
;3286; 1259 318033 INDEX public payment_p2020_05_customer_id_idx postgres
;3289; 1259 318034 INDEX public payment_p2020_06_customer_id_idx postgres
;3299; 0 0 INDEX ATTACH public idx_fk_payment_p2020_01_staff_id postgres
;3301; 0 0 INDEX ATTACH public idx_fk_payment_p2020_02_staff_id postgres
;3303; 0 0 INDEX ATTACH public idx_fk_payment_p2020_03_staff_id postgres
;3305; 0 0 INDEX ATTACH public idx_fk_payment_p2020_04_staff_id postgres
;3307; 0 0 INDEX ATTACH public idx_fk_payment_p2020_05_staff_id postgres
;3309; 0 0 INDEX ATTACH public idx_fk_payment_p2020_06_staff_id postgres
;3300; 0 0 INDEX ATTACH public payment_p2020_01_customer_id_idx postgres
;3302; 0 0 INDEX ATTACH public payment_p2020_02_customer_id_idx postgres
;3304; 0 0 INDEX ATTACH public payment_p2020_03_customer_id_idx postgres
;3306; 0 0 INDEX ATTACH public payment_p2020_04_customer_id_idx postgres
;3308; 0 0 INDEX ATTACH public payment_p2020_05_customer_id_idx postgres
;3310; 0 0 INDEX ATTACH public payment_p2020_06_customer_id_idx postgres
3350; 2620 318035 TRIGGER public film film_fulltext_trigger postgres
3348; 2620 318036 TRIGGER public actor last_updated postgres
;3354; 2620 318037 TRIGGER public address last_updated postgres
3349; 2620 318038 TRIGGER public category last_updated postgres
;3355; 2620 318039 TRIGGER public city last_updated postgres
;3356; 2620 318040 TRIGGER public country last_updated postgres
;3347; 2620 318041 TRIGGER public customer last_updated postgres
3351; 2620 318042 TRIGGER public film last_updated postgres
3352; 2620 318043 TRIGGER public film_actor last_updated postgres
3353; 2620 318044 TRIGGER public film_category last_updated postgres
;3357; 2620 318045 TRIGGER public inventory last_updated postgres
3358; 2620 318046 TRIGGER public language last_updated postgres
3359; 2620 318047 TRIGGER public rental last_updated postgres
;3360; 2620 318048 TRIGGER public staff last_updated postgres
;3361; 2620 318049 TRIGGER public store last_updated postgres
;3319; 2606 318050 FK CONSTRAINT public address address_city_id_fkey postgres
;3320; 2606 318055 FK CONSTRAINT public city city_country_id_fkey postgres
;3311; 2606 318060 FK CONSTRAINT public customer customer_address_id_fkey postgres
;3312; 2606 318065 FK CONSTRAINT public customer customer_store_id_fkey postgres
3315; 2606 318070 FK CONSTRAINT public film_actor film_actor_actor_id_fkey postgres
3316; 2606 318075 FK CONSTRAINT public film_actor film_actor_film_id_fkey postgres
3317; 2606 318080 FK CONSTRAINT public film_category film_category_category_id_fkey postgres
3318; 2606 318085 FK CONSTRAINT public film_category film_category_film_id_fkey postgres
3313; 2606 318090 FK CONSTRAINT public film film_language_id_fkey postgres
3314; 2606 318095 FK CONSTRAINT public film film_original_language_id_fkey postgres
;3321; 2606 318100 FK CONSTRAINT public inventory inventory_film_id_fkey postgres
;3322; 2606 318105 FK CONSTRAINT public inventory inventory_store_id_fkey postgres
;3323; 2606 318110 FK CONSTRAINT public payment_p2020_01 payment_p2020_01_customer_id_fkey postgres
;3324; 2606 318115 FK CONSTRAINT public payment_p2020_01 payment_p2020_01_rental_id_fkey postgres
;3325; 2606 318120 FK CONSTRAINT public payment_p2020_01 payment_p2020_01_staff_id_fkey postgres
;3326; 2606 318125 FK CONSTRAINT public payment_p2020_02 payment_p2020_02_customer_id_fkey postgres
;3327; 2606 318130 FK CONSTRAINT public payment_p2020_02 payment_p2020_02_rental_id_fkey postgres
;3328; 2606 318135 FK CONSTRAINT public payment_p2020_02 payment_p2020_02_staff_id_fkey postgres
;3329; 2606 318140 FK CONSTRAINT public payment_p2020_03 payment_p2020_03_customer_id_fkey postgres
;3330; 2606 318145 FK CONSTRAINT public payment_p2020_03 payment_p2020_03_rental_id_fkey postgres
;3331; 2606 318150 FK CONSTRAINT public payment_p2020_03 payment_p2020_03_staff_id_fkey postgres
;3332; 2606 318155 FK CONSTRAINT public payment_p2020_04 payment_p2020_04_customer_id_fkey postgres
;3333; 2606 318160 FK CONSTRAINT public payment_p2020_04 payment_p2020_04_rental_id_fkey postgres
;3334; 2606 318165 FK CONSTRAINT public payment_p2020_04 payment_p2020_04_staff_id_fkey postgres
;3335; 2606 318170 FK CONSTRAINT public payment_p2020_05 payment_p2020_05_customer_id_fkey postgres
;3336; 2606 318175 FK CONSTRAINT public payment_p2020_05 payment_p2020_05_rental_id_fkey postgres
;3337; 2606 318180 FK CONSTRAINT public payment_p2020_05 payment_p2020_05_staff_id_fkey postgres
;3338; 2606 318185 FK CONSTRAINT public payment_p2020_06 payment_p2020_06_customer_id_fkey postgres
;3339; 2606 318190 FK CONSTRAINT public payment_p2020_06 payment_p2020_06_rental_id_fkey postgres
;3340; 2606 318195 FK CONSTRAINT public payment_p2020_06 payment_p2020_06_staff_id_fkey postgres
;3341; 2606 318200 FK CONSTRAINT public rental rental_customer_id_fkey postgres
;3342; 2606 318205 FK CONSTRAINT public rental rental_inventory_id_fkey postgres
;3343; 2606 318210 FK CONSTRAINT public rental rental_staff_id_fkey postgres
;3344; 2606 318215 FK CONSTRAINT public staff staff_address_id_fkey postgres
;3345; 2606 318220 FK CONSTRAINT public staff staff_store_id_fkey postgres
;3346; 2606 318225 FK CONSTRAINT public store store_address_id_fkey postgres

pgcopydb list

pgcopydb list - List database objects from a Postgres instance

This command prefixes the following sub-commands:

pgcopydb list: List database objects from a Postgres instance

Available commands:
  pgcopydb list
    databases    List databases
    extensions   List all the source extensions to copy
    collations   List all the source collations to copy
    tables       List all the source tables to copy data from
    table-parts  List a source table copy partitions
    sequences    List all the source sequences to copy data from
    indexes      List all the indexes to create again after copying the data
    depends      List all the dependencies to filter-out
    schema       List the schema to migrate, formatted in JSON
    progress     List the progress

pgcopydb list databases

pgcopydb list databases - List databases

The command pgcopydb list databases connects to the source database and executes a SQL query using the Postgres catalogs to get a list of all the databases there.

pgcopydb list databases: List databases
usage: pgcopydb list databases  --source ...

  --source            Postgres URI to the source database

pgcopydb list extensions

pgcopydb list extensions - List all the source extensions to copy

The command pgcopydb list extensions connects to the source database and executes a SQL query using the Postgres catalogs to get a list of all the extensions to COPY to the target database.

pgcopydb list extensions: List all the source extensions to copy
usage: pgcopydb list extensions  --source ...

  --source              Postgres URI to the source database
  --json                Format the output using JSON
  --available-versions  List available extension versions
  --requirements        List extensions requirements

The command pgcopydb list extensions --available-versions is typically used with the target database. If you’re using the connection string environment variables, that looks like the following:

$ pgcopydb list extensions --available-versions --source ${PGCOPYDB_TARGET_PGURI}

pgcopydb list collations

pgcopydb list collations - List all the source collations to copy

The command pgcopydb list collations connects to the source database and executes a SQL query using the Postgres catalogs to get a list of all the collations to COPY to the target database.

pgcopydb list collations: List all the source collations to copy
usage: pgcopydb list collations  --source ...

  --source            Postgres URI to the source database

The SQL query that is used lists the database collation, and then any non-default collation that’s used in a user column or a user index.

pgcopydb list tables

pgcopydb list tables - List all the source tables to copy data from

The command pgcopydb list tables connects to the source database and executes a SQL query using the Postgres catalogs to get a list of all the tables to COPY the data from.

pgcopydb list tables: List all the source tables to copy data from
usage: pgcopydb list tables  --source ...

  --source            Postgres URI to the source database
  --filter <filename> Use the filters defined in <filename>
  --force             Force fetching catalogs again
  --list-skipped      List only tables that are setup to be skipped
  --without-pkey      List only tables that have no primary key

pgcopydb list table-parts

pgcopydb list table-parts - List a source table copy partitions

The command pgcopydb list table-parts connects to the source database and executes a SQL query using the Postgres catalogs to get detailed information about the given source table, and then another SQL query to compute how to split this source table given the size threshold argument.

pgcopydb list table-parts: List a source table copy partitions
usage: pgcopydb list table-parts  --source ...

  --source                    Postgres URI to the source database
  --force                     Force fetching catalogs again
  --schema-name               Name of the schema where to find the table
  --table-name                Name of the target table
  --split-tables-larger-than  Size threshold to consider partitioning
  --split-max-parts           Maximum number of jobs for Same-table concurrency
  --skip-split-by-ctid        Skip the ctid split
  --estimate-table-sizes      Allow using estimates for relation sizes

pgcopydb list sequences

pgcopydb list sequences - List all the source sequences to copy data from

The command pgcopydb list sequences connects to the source database and executes a SQL query using the Postgres catalogs to get a list of all the sequences to COPY the data from.

pgcopydb list sequences: List all the source sequences to copy data from
usage: pgcopydb list sequences  --source ...

  --source            Postgres URI to the source database
  --force             Force fetching catalogs again
  --filter <filename> Use the filters defined in <filename>
  --list-skipped      List only tables that are setup to be skipped

pgcopydb list indexes

pgcopydb list indexes - List all the indexes to create again after copying the data

The command pgcopydb list indexes connects to the source database and executes a SQL query using the Postgres catalogs to get a list of all the indexes to COPY the data from.

pgcopydb list indexes: List all the indexes to create again after copying the data
usage: pgcopydb list indexes  --source ... [ --schema-name [ --table-name ] ]

  --source            Postgres URI to the source database
  --force             Force fetching catalogs again
  --schema-name       Name of the schema where to find the table
  --table-name        Name of the target table
  --filter <filename> Use the filters defined in <filename>
  --list-skipped      List only tables that are setup to be skipped

pgcopydb list depends

pgcopydb list depends - List all the dependencies to filter-out

The command pgcopydb list depends connects to the source database and executes a SQL query using the Postgres catalogs to get a list of all the objects that depend on excluded objects from the filtering rules.

pgcopydb list depends: List all the dependencies to filter-out
usage: pgcopydb list depends  --source ... [ --schema-name [ --table-name ] ]

  --source            Postgres URI to the source database
  --force             Force fetching catalogs again
  --schema-name       Name of the schema where to find the table
  --table-name        Name of the target table
  --filter <filename> Use the filters defined in <filename>
  --list-skipped      List only tables that are setup to be skipped

pgcopydb list schema

pgcopydb list schema - List the schema to migrate, formatted in JSON

The command pgcopydb list schema connects to the source database and executes a SQL queries using the Postgres catalogs to get a list of the tables, indexes, and sequences to migrate. The command then outputs a JSON formatted string that contains detailed information about all those objects.

pgcopydb list schema: List the schema to migrate, formatted in JSON
usage: pgcopydb list schema  --source ...

  --source            Postgres URI to the source database
  --force             Force fetching catalogs again
  --filter <filename> Use the filters defined in <filename>

pgcopydb list progress

pgcopydb list progress - List the progress

The command pgcopydb list progress reads the internal SQLite catalogs in the work directory, parses it, and then computes how many tables and indexes are planned to be copied and created on the target database, how many have been done already, and how many are in-progress.

The --summary option displays the top-level summary, and can be used while the command is running or after-the-fact.

When using the option --json the JSON formatted output also includes a list of all the tables and indexes that are currently being processed.

pgcopydb list progress: List the progress
usage: pgcopydb list progress  --source ...

  --source  Postgres URI to the source database
  --summary List the summary, requires --json
  --json    Format the output using JSON
  --dir     Work directory to use

Options

The following options are available to pgcopydb dump schema:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--schema-name

Filter indexes from a given schema only.

--table-name

Filter indexes from a given table only (use --schema-name to fully qualify the table).

--without-pkey

List only tables from the source database when they have no primary key attached to their schema.

--filter <filename>

This option allows to skip objects in the list operations. See Filtering for details about the expected file format and the filtering options available.

--list-skipped

Instead of listing objects that are selected for copy by the filters installed with the --filter option, list the objects that are going to be skipped when using the filters.

--summary

Instead of listing current progress when the command is still running, instead list the summary with timing details for each step and for all tables, indexes, and constraints.

This options requires the --json option too: at the moment only this output format is supported.

--json

The output of the command is formatted in JSON, when supported. Ignored otherwise.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

Examples

Listing the tables:

$ pgcopydb list tables
14:35:18 13827 INFO  Listing ordinary tables in "port=54311 host=localhost dbname=pgloader"
14:35:19 13827 INFO  Fetched information for 56 tables
     OID |          Schema Name |           Table Name |  Est. Row Count |    On-disk size
---------+----------------------+----------------------+-----------------+----------------
   17085 |                  csv |                track |            3503 |          544 kB
   17098 |             expected |                track |            3503 |          544 kB
   17290 |             expected |           track_full |            3503 |          544 kB
   17276 |               public |           track_full |            3503 |          544 kB
   17016 |             expected |            districts |             440 |           72 kB
   17007 |               public |            districts |             440 |           72 kB
   16998 |                  csv |               blocks |             460 |           48 kB
   17003 |             expected |               blocks |             460 |           48 kB
   17405 |                  csv |              partial |               7 |           16 kB
   17323 |                  err |               errors |               0 |           16 kB
   16396 |             expected |              allcols |               0 |           16 kB
   17265 |             expected |                  csv |               0 |           16 kB
   17056 |             expected |      csv_escape_mode |               0 |           16 kB
   17331 |             expected |               errors |               0 |           16 kB
   17116 |             expected |                group |               0 |           16 kB
   17134 |             expected |                 json |               0 |           16 kB
   17074 |             expected |             matching |               0 |           16 kB
   17201 |             expected |               nullif |               0 |           16 kB
   17229 |             expected |                nulls |               0 |           16 kB
   17417 |             expected |              partial |               0 |           16 kB
   17313 |             expected |              reg2013 |               0 |           16 kB
   17437 |             expected |               serial |               0 |           16 kB
   17247 |             expected |                 sexp |               0 |           16 kB
   17378 |             expected |                test1 |               0 |           16 kB
   17454 |             expected |                  udc |               0 |           16 kB
   17471 |             expected |                xzero |               0 |           16 kB
   17372 |               nsitra |                test1 |               0 |           16 kB
   16388 |               public |              allcols |               0 |           16 kB
   17256 |               public |                  csv |               0 |           16 kB
   17047 |               public |      csv_escape_mode |               0 |           16 kB
   17107 |               public |                group |               0 |           16 kB
   17125 |               public |                 json |               0 |           16 kB
   17065 |               public |             matching |               0 |           16 kB
   17192 |               public |               nullif |               0 |           16 kB
   17219 |               public |                nulls |               0 |           16 kB
   17307 |               public |              reg2013 |               0 |           16 kB
   17428 |               public |               serial |               0 |           16 kB
   17238 |               public |                 sexp |               0 |           16 kB
   17446 |               public |                  udc |               0 |           16 kB
   17463 |               public |                xzero |               0 |           16 kB
   17303 |             expected |              copyhex |               0 |      8192 bytes
   17033 |             expected |           dateformat |               0 |      8192 bytes
   17366 |             expected |                fixed |               0 |      8192 bytes
   17041 |             expected |              jordane |               0 |      8192 bytes
   17173 |             expected |           missingcol |               0 |      8192 bytes
   17396 |             expected |             overflow |               0 |      8192 bytes
   17186 |             expected |              tab_csv |               0 |      8192 bytes
   17213 |             expected |                 temp |               0 |      8192 bytes
   17299 |               public |              copyhex |               0 |      8192 bytes
   17029 |               public |           dateformat |               0 |      8192 bytes
   17362 |               public |                fixed |               0 |      8192 bytes
   17037 |               public |              jordane |               0 |      8192 bytes
   17164 |               public |           missingcol |               0 |      8192 bytes
   17387 |               public |             overflow |               0 |      8192 bytes
   17182 |               public |              tab_csv |               0 |      8192 bytes
   17210 |               public |                 temp |               0 |      8192 bytes

Listing a table list of COPY partitions:

$ pgcopydb list table-parts --table-name rental --split-at 300kB
16:43:26 73794 INFO  Running pgcopydb version 0.8.8.g0838291.dirty from "/Users/dim/dev/PostgreSQL/pgcopydb/src/bin/pgcopydb/pgcopydb"
16:43:26 73794 INFO  Listing COPY partitions for table "public"."rental" in "postgres://@:/pagila?"
16:43:26 73794 INFO  Table "public"."rental" COPY will be split 5-ways
      Part |        Min |        Max |      Count
-----------+------------+------------+-----------
       1/5 |          1 |       3211 |       3211
       2/5 |       3212 |       6422 |       3211
       3/5 |       6423 |       9633 |       3211
       4/5 |       9634 |      12844 |       3211
       5/5 |      12845 |      16049 |       3205

Listing the indexes:

$ pgcopydb list indexes
14:35:07 13668 INFO  Listing indexes in "port=54311 host=localhost dbname=pgloader"
14:35:07 13668 INFO  Fetching all indexes in source database
14:35:07 13668 INFO  Fetched information for 12 indexes
     OID |     Schema |           Index Name |         conname |                Constraint | DDL
---------+------------+----------------------+-----------------+---------------------------+---------------------
   17002 |        csv |      blocks_ip4r_idx |                 |                           | CREATE INDEX blocks_ip4r_idx ON csv.blocks USING gist (iprange)
   17415 |        csv |        partial_b_idx |                 |                           | CREATE INDEX partial_b_idx ON csv.partial USING btree (b)
   17414 |        csv |        partial_a_key |   partial_a_key |                UNIQUE (a) | CREATE UNIQUE INDEX partial_a_key ON csv.partial USING btree (a)
   17092 |        csv |           track_pkey |      track_pkey |     PRIMARY KEY (trackid) | CREATE UNIQUE INDEX track_pkey ON csv.track USING btree (trackid)
   17329 |        err |          errors_pkey |     errors_pkey |           PRIMARY KEY (a) | CREATE UNIQUE INDEX errors_pkey ON err.errors USING btree (a)
   16394 |     public |         allcols_pkey |    allcols_pkey |           PRIMARY KEY (a) | CREATE UNIQUE INDEX allcols_pkey ON public.allcols USING btree (a)
   17054 |     public | csv_escape_mode_pkey | csv_escape_mode_pkey |          PRIMARY KEY (id) | CREATE UNIQUE INDEX csv_escape_mode_pkey ON public.csv_escape_mode USING btree (id)
   17199 |     public |          nullif_pkey |     nullif_pkey |          PRIMARY KEY (id) | CREATE UNIQUE INDEX nullif_pkey ON public."nullif" USING btree (id)
   17435 |     public |          serial_pkey |     serial_pkey |           PRIMARY KEY (a) | CREATE UNIQUE INDEX serial_pkey ON public.serial USING btree (a)
   17288 |     public |      track_full_pkey | track_full_pkey |     PRIMARY KEY (trackid) | CREATE UNIQUE INDEX track_full_pkey ON public.track_full USING btree (trackid)
   17452 |     public |             udc_pkey |        udc_pkey |           PRIMARY KEY (b) | CREATE UNIQUE INDEX udc_pkey ON public.udc USING btree (b)
   17469 |     public |           xzero_pkey |      xzero_pkey |           PRIMARY KEY (a) | CREATE UNIQUE INDEX xzero_pkey ON public.xzero USING btree (a)

Listing the schema in JSON:

$ pgcopydb list schema --split-at 200kB

This gives the following JSON output:

{
    "setup": {
        "snapshot": "00000003-00051AAE-1",
        "source_pguri": "postgres:\/\/@:\/pagila?",
        "target_pguri": "postgres:\/\/@:\/plop?",
        "table-jobs": 4,
        "index-jobs": 4,
        "split-tables-larger-than": 204800
    },
    "tables": [
        {
            "oid": 317934,
            "schema": "public",
            "name": "rental",
            "reltuples": 16044,
            "bytes": 1253376,
            "bytes-pretty": "1224 kB",
            "exclude-data": false,
            "restore-list-name": "public rental postgres",
            "part-key": "rental_id",
            "parts": [
                {
                    "number": 1,
                    "total": 7,
                    "min": 1,
                    "max": 2294,
                    "count": 2294
                },
                {
                    "number": 2,
                    "total": 7,
                    "min": 2295,
                    "max": 4588,
                    "count": 2294
                },
                {
                    "number": 3,
                    "total": 7,
                    "min": 4589,
                    "max": 6882,
                    "count": 2294
                },
                {
                    "number": 4,
                    "total": 7,
                    "min": 6883,
                    "max": 9176,
                    "count": 2294
                },
                {
                    "number": 5,
                    "total": 7,
                    "min": 9177,
                    "max": 11470,
                    "count": 2294
                },
                {
                    "number": 6,
                    "total": 7,
                    "min": 11471,
                    "max": 13764,
                    "count": 2294
                },
                {
                    "number": 7,
                    "total": 7,
                    "min": 13765,
                    "max": 16049,
                    "count": 2285
                }
            ]
        },
        {
            "oid": 317818,
            "schema": "public",
            "name": "film",
            "reltuples": 1000,
            "bytes": 483328,
            "bytes-pretty": "472 kB",
            "exclude-data": false,
            "restore-list-name": "public film postgres",
            "part-key": "film_id",
            "parts": [
                {
                    "number": 1,
                    "total": 3,
                    "min": 1,
                    "max": 334,
                    "count": 334
                },
                {
                    "number": 2,
                    "total": 3,
                    "min": 335,
                    "max": 668,
                    "count": 334
                },
                {
                    "number": 3,
                    "total": 3,
                    "min": 669,
                    "max": 1000,
                    "count": 332
                }
            ]
        },
        {
            "oid": 317920,
            "schema": "public",
            "name": "payment_p2020_04",
            "reltuples": 6754,
            "bytes": 434176,
            "bytes-pretty": "424 kB",
            "exclude-data": false,
            "restore-list-name": "public payment_p2020_04 postgres",
            "part-key": ""
        },
        {
            "oid": 317916,
            "schema": "public",
            "name": "payment_p2020_03",
            "reltuples": 5644,
            "bytes": 368640,
            "bytes-pretty": "360 kB",
            "exclude-data": false,
            "restore-list-name": "public payment_p2020_03 postgres",
            "part-key": ""
        },
        {
            "oid": 317830,
            "schema": "public",
            "name": "film_actor",
            "reltuples": 5462,
            "bytes": 270336,
            "bytes-pretty": "264 kB",
            "exclude-data": false,
            "restore-list-name": "public film_actor postgres",
            "part-key": ""
        },
        {
            "oid": 317885,
            "schema": "public",
            "name": "inventory",
            "reltuples": 4581,
            "bytes": 270336,
            "bytes-pretty": "264 kB",
            "exclude-data": false,
            "restore-list-name": "public inventory postgres",
            "part-key": "inventory_id",
            "parts": [
                {
                    "number": 1,
                    "total": 2,
                    "min": 1,
                    "max": 2291,
                    "count": 2291
                },
                {
                    "number": 2,
                    "total": 2,
                    "min": 2292,
                    "max": 4581,
                    "count": 2290
                }
            ]
        },
        {
            "oid": 317912,
            "schema": "public",
            "name": "payment_p2020_02",
            "reltuples": 2312,
            "bytes": 163840,
            "bytes-pretty": "160 kB",
            "exclude-data": false,
            "restore-list-name": "public payment_p2020_02 postgres",
            "part-key": ""
        },
        {
            "oid": 317784,
            "schema": "public",
            "name": "customer",
            "reltuples": 599,
            "bytes": 106496,
            "bytes-pretty": "104 kB",
            "exclude-data": false,
            "restore-list-name": "public customer postgres",
            "part-key": "customer_id"
        },
        {
            "oid": 317845,
            "schema": "public",
            "name": "address",
            "reltuples": 603,
            "bytes": 98304,
            "bytes-pretty": "96 kB",
            "exclude-data": false,
            "restore-list-name": "public address postgres",
            "part-key": "address_id"
        },
        {
            "oid": 317908,
            "schema": "public",
            "name": "payment_p2020_01",
            "reltuples": 1157,
            "bytes": 98304,
            "bytes-pretty": "96 kB",
            "exclude-data": false,
            "restore-list-name": "public payment_p2020_01 postgres",
            "part-key": ""
        },
        {
            "oid": 317855,
            "schema": "public",
            "name": "city",
            "reltuples": 600,
            "bytes": 73728,
            "bytes-pretty": "72 kB",
            "exclude-data": false,
            "restore-list-name": "public city postgres",
            "part-key": "city_id"
        },
        {
            "oid": 317834,
            "schema": "public",
            "name": "film_category",
            "reltuples": 1000,
            "bytes": 73728,
            "bytes-pretty": "72 kB",
            "exclude-data": false,
            "restore-list-name": "public film_category postgres",
            "part-key": ""
        },
        {
            "oid": 317798,
            "schema": "public",
            "name": "actor",
            "reltuples": 200,
            "bytes": 49152,
            "bytes-pretty": "48 kB",
            "exclude-data": false,
            "restore-list-name": "public actor postgres",
            "part-key": "actor_id"
        },
        {
            "oid": 317924,
            "schema": "public",
            "name": "payment_p2020_05",
            "reltuples": 182,
            "bytes": 40960,
            "bytes-pretty": "40 kB",
            "exclude-data": false,
            "restore-list-name": "public payment_p2020_05 postgres",
            "part-key": ""
        },
        {
            "oid": 317808,
            "schema": "public",
            "name": "category",
            "reltuples": 0,
            "bytes": 16384,
            "bytes-pretty": "16 kB",
            "exclude-data": false,
            "restore-list-name": "public category postgres",
            "part-key": "category_id"
        },
        {
            "oid": 317865,
            "schema": "public",
            "name": "country",
            "reltuples": 109,
            "bytes": 16384,
            "bytes-pretty": "16 kB",
            "exclude-data": false,
            "restore-list-name": "public country postgres",
            "part-key": "country_id"
        },
        {
            "oid": 317946,
            "schema": "public",
            "name": "staff",
            "reltuples": 0,
            "bytes": 16384,
            "bytes-pretty": "16 kB",
            "exclude-data": false,
            "restore-list-name": "public staff postgres",
            "part-key": "staff_id"
        },
        {
            "oid": 378280,
            "schema": "pgcopydb",
            "name": "sentinel",
            "reltuples": 1,
            "bytes": 8192,
            "bytes-pretty": "8192 bytes",
            "exclude-data": false,
            "restore-list-name": "pgcopydb sentinel dim",
            "part-key": ""
        },
        {
            "oid": 317892,
            "schema": "public",
            "name": "language",
            "reltuples": 0,
            "bytes": 8192,
            "bytes-pretty": "8192 bytes",
            "exclude-data": false,
            "restore-list-name": "public language postgres",
            "part-key": "language_id"
        },
        {
            "oid": 317928,
            "schema": "public",
            "name": "payment_p2020_06",
            "reltuples": 0,
            "bytes": 8192,
            "bytes-pretty": "8192 bytes",
            "exclude-data": false,
            "restore-list-name": "public payment_p2020_06 postgres",
            "part-key": ""
        },
        {
            "oid": 317957,
            "schema": "public",
            "name": "store",
            "reltuples": 0,
            "bytes": 8192,
            "bytes-pretty": "8192 bytes",
            "exclude-data": false,
            "restore-list-name": "public store postgres",
            "part-key": "store_id"
        }
    ],
    "indexes": [
        {
            "oid": 378283,
            "schema": "pgcopydb",
            "name": "sentinel_expr_idx",
            "isPrimary": false,
            "isUnique": true,
            "columns": "",
            "sql": "CREATE UNIQUE INDEX sentinel_expr_idx ON pgcopydb.sentinel USING btree ((1))",
            "restore-list-name": "pgcopydb sentinel_expr_idx dim",
            "table": {
                "oid": 378280,
                "schema": "pgcopydb",
                "name": "sentinel"
            }
        },
        {
            "oid": 318001,
            "schema": "public",
            "name": "idx_actor_last_name",
            "isPrimary": false,
            "isUnique": false,
            "columns": "last_name",
            "sql": "CREATE INDEX idx_actor_last_name ON public.actor USING btree (last_name)",
            "restore-list-name": "public idx_actor_last_name postgres",
            "table": {
                "oid": 317798,
                "schema": "public",
                "name": "actor"
            }
        },
        {
            "oid": 317972,
            "schema": "public",
            "name": "actor_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "actor_id",
            "sql": "CREATE UNIQUE INDEX actor_pkey ON public.actor USING btree (actor_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317798,
                "schema": "public",
                "name": "actor"
            },
            "constraint": {
                "oid": 317973,
                "name": "actor_pkey",
                "sql": "PRIMARY KEY (actor_id)"
            }
        },
        {
            "oid": 317974,
            "schema": "public",
            "name": "address_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "address_id",
            "sql": "CREATE UNIQUE INDEX address_pkey ON public.address USING btree (address_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317845,
                "schema": "public",
                "name": "address"
            },
            "constraint": {
                "oid": 317975,
                "name": "address_pkey",
                "sql": "PRIMARY KEY (address_id)"
            }
        },
        {
            "oid": 318003,
            "schema": "public",
            "name": "idx_fk_city_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "city_id",
            "sql": "CREATE INDEX idx_fk_city_id ON public.address USING btree (city_id)",
            "restore-list-name": "public idx_fk_city_id postgres",
            "table": {
                "oid": 317845,
                "schema": "public",
                "name": "address"
            }
        },
        {
            "oid": 317976,
            "schema": "public",
            "name": "category_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "category_id",
            "sql": "CREATE UNIQUE INDEX category_pkey ON public.category USING btree (category_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317808,
                "schema": "public",
                "name": "category"
            },
            "constraint": {
                "oid": 317977,
                "name": "category_pkey",
                "sql": "PRIMARY KEY (category_id)"
            }
        },
        {
            "oid": 317978,
            "schema": "public",
            "name": "city_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "city_id",
            "sql": "CREATE UNIQUE INDEX city_pkey ON public.city USING btree (city_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317855,
                "schema": "public",
                "name": "city"
            },
            "constraint": {
                "oid": 317979,
                "name": "city_pkey",
                "sql": "PRIMARY KEY (city_id)"
            }
        },
        {
            "oid": 318004,
            "schema": "public",
            "name": "idx_fk_country_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "country_id",
            "sql": "CREATE INDEX idx_fk_country_id ON public.city USING btree (country_id)",
            "restore-list-name": "public idx_fk_country_id postgres",
            "table": {
                "oid": 317855,
                "schema": "public",
                "name": "city"
            }
        },
        {
            "oid": 317980,
            "schema": "public",
            "name": "country_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "country_id",
            "sql": "CREATE UNIQUE INDEX country_pkey ON public.country USING btree (country_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317865,
                "schema": "public",
                "name": "country"
            },
            "constraint": {
                "oid": 317981,
                "name": "country_pkey",
                "sql": "PRIMARY KEY (country_id)"
            }
        },
        {
            "oid": 318024,
            "schema": "public",
            "name": "idx_last_name",
            "isPrimary": false,
            "isUnique": false,
            "columns": "last_name",
            "sql": "CREATE INDEX idx_last_name ON public.customer USING btree (last_name)",
            "restore-list-name": "public idx_last_name postgres",
            "table": {
                "oid": 317784,
                "schema": "public",
                "name": "customer"
            }
        },
        {
            "oid": 318002,
            "schema": "public",
            "name": "idx_fk_address_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "address_id",
            "sql": "CREATE INDEX idx_fk_address_id ON public.customer USING btree (address_id)",
            "restore-list-name": "public idx_fk_address_id postgres",
            "table": {
                "oid": 317784,
                "schema": "public",
                "name": "customer"
            }
        },
        {
            "oid": 317982,
            "schema": "public",
            "name": "customer_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "customer_id",
            "sql": "CREATE UNIQUE INDEX customer_pkey ON public.customer USING btree (customer_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317784,
                "schema": "public",
                "name": "customer"
            },
            "constraint": {
                "oid": 317983,
                "name": "customer_pkey",
                "sql": "PRIMARY KEY (customer_id)"
            }
        },
        {
            "oid": 318023,
            "schema": "public",
            "name": "idx_fk_store_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "store_id",
            "sql": "CREATE INDEX idx_fk_store_id ON public.customer USING btree (store_id)",
            "restore-list-name": "public idx_fk_store_id postgres",
            "table": {
                "oid": 317784,
                "schema": "public",
                "name": "customer"
            }
        },
        {
            "oid": 318009,
            "schema": "public",
            "name": "idx_fk_original_language_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "original_language_id",
            "sql": "CREATE INDEX idx_fk_original_language_id ON public.film USING btree (original_language_id)",
            "restore-list-name": "public idx_fk_original_language_id postgres",
            "table": {
                "oid": 317818,
                "schema": "public",
                "name": "film"
            }
        },
        {
            "oid": 318026,
            "schema": "public",
            "name": "idx_title",
            "isPrimary": false,
            "isUnique": false,
            "columns": "title",
            "sql": "CREATE INDEX idx_title ON public.film USING btree (title)",
            "restore-list-name": "public idx_title postgres",
            "table": {
                "oid": 317818,
                "schema": "public",
                "name": "film"
            }
        },
        {
            "oid": 318000,
            "schema": "public",
            "name": "film_fulltext_idx",
            "isPrimary": false,
            "isUnique": false,
            "columns": "fulltext",
            "sql": "CREATE INDEX film_fulltext_idx ON public.film USING gist (fulltext)",
            "restore-list-name": "public film_fulltext_idx postgres",
            "table": {
                "oid": 317818,
                "schema": "public",
                "name": "film"
            }
        },
        {
            "oid": 317988,
            "schema": "public",
            "name": "film_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "film_id",
            "sql": "CREATE UNIQUE INDEX film_pkey ON public.film USING btree (film_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317818,
                "schema": "public",
                "name": "film"
            },
            "constraint": {
                "oid": 317989,
                "name": "film_pkey",
                "sql": "PRIMARY KEY (film_id)"
            }
        },
        {
            "oid": 318008,
            "schema": "public",
            "name": "idx_fk_language_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "language_id",
            "sql": "CREATE INDEX idx_fk_language_id ON public.film USING btree (language_id)",
            "restore-list-name": "public idx_fk_language_id postgres",
            "table": {
                "oid": 317818,
                "schema": "public",
                "name": "film"
            }
        },
        {
            "oid": 317984,
            "schema": "public",
            "name": "film_actor_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "actor_id,film_id",
            "sql": "CREATE UNIQUE INDEX film_actor_pkey ON public.film_actor USING btree (actor_id, film_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317830,
                "schema": "public",
                "name": "film_actor"
            },
            "constraint": {
                "oid": 317985,
                "name": "film_actor_pkey",
                "sql": "PRIMARY KEY (actor_id, film_id)"
            }
        },
        {
            "oid": 318006,
            "schema": "public",
            "name": "idx_fk_film_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "film_id",
            "sql": "CREATE INDEX idx_fk_film_id ON public.film_actor USING btree (film_id)",
            "restore-list-name": "public idx_fk_film_id postgres",
            "table": {
                "oid": 317830,
                "schema": "public",
                "name": "film_actor"
            }
        },
        {
            "oid": 317986,
            "schema": "public",
            "name": "film_category_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "film_id,category_id",
            "sql": "CREATE UNIQUE INDEX film_category_pkey ON public.film_category USING btree (film_id, category_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317834,
                "schema": "public",
                "name": "film_category"
            },
            "constraint": {
                "oid": 317987,
                "name": "film_category_pkey",
                "sql": "PRIMARY KEY (film_id, category_id)"
            }
        },
        {
            "oid": 318025,
            "schema": "public",
            "name": "idx_store_id_film_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "film_id,store_id",
            "sql": "CREATE INDEX idx_store_id_film_id ON public.inventory USING btree (store_id, film_id)",
            "restore-list-name": "public idx_store_id_film_id postgres",
            "table": {
                "oid": 317885,
                "schema": "public",
                "name": "inventory"
            }
        },
        {
            "oid": 317990,
            "schema": "public",
            "name": "inventory_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "inventory_id",
            "sql": "CREATE UNIQUE INDEX inventory_pkey ON public.inventory USING btree (inventory_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317885,
                "schema": "public",
                "name": "inventory"
            },
            "constraint": {
                "oid": 317991,
                "name": "inventory_pkey",
                "sql": "PRIMARY KEY (inventory_id)"
            }
        },
        {
            "oid": 317992,
            "schema": "public",
            "name": "language_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "language_id",
            "sql": "CREATE UNIQUE INDEX language_pkey ON public.language USING btree (language_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317892,
                "schema": "public",
                "name": "language"
            },
            "constraint": {
                "oid": 317993,
                "name": "language_pkey",
                "sql": "PRIMARY KEY (language_id)"
            }
        },
        {
            "oid": 318010,
            "schema": "public",
            "name": "idx_fk_payment_p2020_01_customer_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_01_customer_id ON public.payment_p2020_01 USING btree stomer_id)",
            "restore-list-name": "public idx_fk_payment_p2020_01_customer_id postgres",
            "table": {
                "oid": 317908,
                "schema": "public",
                "name": "payment_p2020_01"
            }
        },
        {
            "oid": 318029,
            "schema": "public",
            "name": "payment_p2020_01_customer_id_idx",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX payment_p2020_01_customer_id_idx ON public.payment_p2020_01 USING btree (customer_id)
            "restore-list-name": "public payment_p2020_01_customer_id_idx postgres",
            "table": {
                "oid": 317908,
                "schema": "public",
                "name": "payment_p2020_01"
            }
        },
        {
            "oid": 318012,
            "schema": "public",
            "name": "idx_fk_payment_p2020_01_staff_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "staff_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_01_staff_id ON public.payment_p2020_01 USING btree (staff_id)",
            "restore-list-name": "public idx_fk_payment_p2020_01_staff_id postgres",
            "table": {
                "oid": 317908,
                "schema": "public",
                "name": "payment_p2020_01"
            }
        },
        {
            "oid": 318013,
            "schema": "public",
            "name": "idx_fk_payment_p2020_02_customer_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_02_customer_id ON public.payment_p2020_02 USING btree stomer_id)",
            "restore-list-name": "public idx_fk_payment_p2020_02_customer_id postgres",
            "table": {
                "oid": 317912,
                "schema": "public",
                "name": "payment_p2020_02"
            }
        },
        {
            "oid": 318014,
            "schema": "public",
            "name": "idx_fk_payment_p2020_02_staff_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "staff_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_02_staff_id ON public.payment_p2020_02 USING btree (staff_id)",
            "restore-list-name": "public idx_fk_payment_p2020_02_staff_id postgres",
            "table": {
                "oid": 317912,
                "schema": "public",
                "name": "payment_p2020_02"
            }
        },
        {
            "oid": 318030,
            "schema": "public",
            "name": "payment_p2020_02_customer_id_idx",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX payment_p2020_02_customer_id_idx ON public.payment_p2020_02 USING btree (customer_id)
            "restore-list-name": "public payment_p2020_02_customer_id_idx postgres",
            "table": {
                "oid": 317912,
                "schema": "public",
                "name": "payment_p2020_02"
            }
        },
        {
            "oid": 318016,
            "schema": "public",
            "name": "idx_fk_payment_p2020_03_staff_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "staff_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_03_staff_id ON public.payment_p2020_03 USING btree (staff_id)",
            "restore-list-name": "public idx_fk_payment_p2020_03_staff_id postgres",
            "table": {
                "oid": 317916,
                "schema": "public",
                "name": "payment_p2020_03"
            }
        },
        {
            "oid": 318031,
            "schema": "public",
            "name": "payment_p2020_03_customer_id_idx",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX payment_p2020_03_customer_id_idx ON public.payment_p2020_03 USING btree (customer_id)
            "restore-list-name": "public payment_p2020_03_customer_id_idx postgres",
            "table": {
                "oid": 317916,
                "schema": "public",
                "name": "payment_p2020_03"
            }
        },
        {
            "oid": 318015,
            "schema": "public",
            "name": "idx_fk_payment_p2020_03_customer_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_03_customer_id ON public.payment_p2020_03 USING btree stomer_id)",
            "restore-list-name": "public idx_fk_payment_p2020_03_customer_id postgres",
            "table": {
                "oid": 317916,
                "schema": "public",
                "name": "payment_p2020_03"
            }
        },
        {
            "oid": 318032,
            "schema": "public",
            "name": "payment_p2020_04_customer_id_idx",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX payment_p2020_04_customer_id_idx ON public.payment_p2020_04 USING btree (customer_id)
            "restore-list-name": "public payment_p2020_04_customer_id_idx postgres",
            "table": {
                "oid": 317920,
                "schema": "public",
                "name": "payment_p2020_04"
            }
        },
        {
            "oid": 318018,
            "schema": "public",
            "name": "idx_fk_payment_p2020_04_staff_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "staff_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_04_staff_id ON public.payment_p2020_04 USING btree (staff_id)",
            "restore-list-name": "public idx_fk_payment_p2020_04_staff_id postgres",
            "table": {
                "oid": 317920,
                "schema": "public",
                "name": "payment_p2020_04"
            }
        },
        {
            "oid": 318017,
            "schema": "public",
            "name": "idx_fk_payment_p2020_04_customer_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_04_customer_id ON public.payment_p2020_04 USING btree stomer_id)",
            "restore-list-name": "public idx_fk_payment_p2020_04_customer_id postgres",
            "table": {
                "oid": 317920,
                "schema": "public",
                "name": "payment_p2020_04"
            }
        },
        {
            "oid": 318019,
            "schema": "public",
            "name": "idx_fk_payment_p2020_05_customer_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_05_customer_id ON public.payment_p2020_05 USING btree stomer_id)",
            "restore-list-name": "public idx_fk_payment_p2020_05_customer_id postgres",
            "table": {
                "oid": 317924,
                "schema": "public",
                "name": "payment_p2020_05"
            }
        },
        {
            "oid": 318020,
            "schema": "public",
            "name": "idx_fk_payment_p2020_05_staff_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "staff_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_05_staff_id ON public.payment_p2020_05 USING btree (staff_id)",
            "restore-list-name": "public idx_fk_payment_p2020_05_staff_id postgres",
            "table": {
                "oid": 317924,
                "schema": "public",
                "name": "payment_p2020_05"
            }
        },
        {
            "oid": 318033,
            "schema": "public",
            "name": "payment_p2020_05_customer_id_idx",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX payment_p2020_05_customer_id_idx ON public.payment_p2020_05 USING btree (customer_id)
            "restore-list-name": "public payment_p2020_05_customer_id_idx postgres",
            "table": {
                "oid": 317924,
                "schema": "public",
                "name": "payment_p2020_05"
            }
        },
        {
            "oid": 318022,
            "schema": "public",
            "name": "idx_fk_payment_p2020_06_staff_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "staff_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_06_staff_id ON public.payment_p2020_06 USING btree (staff_id)",
            "restore-list-name": "public idx_fk_payment_p2020_06_staff_id postgres",
            "table": {
                "oid": 317928,
                "schema": "public",
                "name": "payment_p2020_06"
            }
        },
        {
            "oid": 318034,
            "schema": "public",
            "name": "payment_p2020_06_customer_id_idx",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX payment_p2020_06_customer_id_idx ON public.payment_p2020_06 USING btree (customer_id)
            "restore-list-name": "public payment_p2020_06_customer_id_idx postgres",
            "table": {
                "oid": 317928,
                "schema": "public",
                "name": "payment_p2020_06"
            }
        },
        {
            "oid": 318021,
            "schema": "public",
            "name": "idx_fk_payment_p2020_06_customer_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "customer_id",
            "sql": "CREATE INDEX idx_fk_payment_p2020_06_customer_id ON public.payment_p2020_06 USING btree tomer_id)",
            "restore-list-name": "public idx_fk_payment_p2020_06_customer_id postgres",
            "table": {
                "oid": 317928,
                "schema": "public",
                "name": "payment_p2020_06"
            }
        },
        {
            "oid": 318028,
            "schema": "public",
            "name": "idx_unq_rental_rental_date_inventory_id_customer_id",
            "isPrimary": false,
            "isUnique": true,
            "columns": "rental_date,inventory_id,customer_id",
            "sql": "CREATE UNIQUE INDEX idx_unq_rental_rental_date_inventory_id_customer_id ON public.rental USING e (rental_date, inventory_id, customer_id)",
            "restore-list-name": "public idx_unq_rental_rental_date_inventory_id_customer_id postgres",
            "table": {
                "oid": 317934,
                "schema": "public",
                "name": "rental"
            }
        },
        {
            "oid": 317994,
            "schema": "public",
            "name": "rental_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "rental_id",
            "sql": "CREATE UNIQUE INDEX rental_pkey ON public.rental USING btree (rental_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317934,
                "schema": "public",
                "name": "rental"
            },
            "constraint": {
                "oid": 317995,
                "name": "rental_pkey",
                "sql": "PRIMARY KEY (rental_id)"
            }
        },
        {
            "oid": 318007,
            "schema": "public",
            "name": "idx_fk_inventory_id",
            "isPrimary": false,
            "isUnique": false,
            "columns": "inventory_id",
            "sql": "CREATE INDEX idx_fk_inventory_id ON public.rental USING btree (inventory_id)",
            "restore-list-name": "public idx_fk_inventory_id postgres",
            "table": {
                "oid": 317934,
                "schema": "public",
                "name": "rental"
            }
        },
        {
            "oid": 317996,
            "schema": "public",
            "name": "staff_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "staff_id",
            "sql": "CREATE UNIQUE INDEX staff_pkey ON public.staff USING btree (staff_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317946,
                "schema": "public",
                "name": "staff"
            },
            "constraint": {
                "oid": 317997,
                "name": "staff_pkey",
                "sql": "PRIMARY KEY (staff_id)"
            }
        },
        {
            "oid": 318027,
            "schema": "public",
            "name": "idx_unq_manager_staff_id",
            "isPrimary": false,
            "isUnique": true,
            "columns": "manager_staff_id",
            "sql": "CREATE UNIQUE INDEX idx_unq_manager_staff_id ON public.store USING btree (manager_staff_id)",
            "restore-list-name": "public idx_unq_manager_staff_id postgres",
            "table": {
                "oid": 317957,
                "schema": "public",
                "name": "store"
            }
        },
        {
            "oid": 317998,
            "schema": "public",
            "name": "store_pkey",
            "isPrimary": true,
            "isUnique": true,
            "columns": "store_id",
            "sql": "CREATE UNIQUE INDEX store_pkey ON public.store USING btree (store_id)",
            "restore-list-name": "",
            "table": {
                "oid": 317957,
                "schema": "public",
                "name": "store"
            },
            "constraint": {
                "oid": 317999,
                "name": "store_pkey",
                "sql": "PRIMARY KEY (store_id)"
            }
        }
    ],
    "sequences": [
        {
            "oid": 317796,
            "schema": "public",
            "name": "actor_actor_id_seq",
            "last-value": 200,
            "is-called": true,
            "restore-list-name": "public actor_actor_id_seq postgres"
        },
        {
            "oid": 317843,
            "schema": "public",
            "name": "address_address_id_seq",
            "last-value": 605,
            "is-called": true,
            "restore-list-name": "public address_address_id_seq postgres"
        },
        {
            "oid": 317806,
            "schema": "public",
            "name": "category_category_id_seq",
            "last-value": 16,
            "is-called": true,
            "restore-list-name": "public category_category_id_seq postgres"
        },
        {
            "oid": 317853,
            "schema": "public",
            "name": "city_city_id_seq",
            "last-value": 600,
            "is-called": true,
            "restore-list-name": "public city_city_id_seq postgres"
        },
        {
            "oid": 317863,
            "schema": "public",
            "name": "country_country_id_seq",
            "last-value": 109,
            "is-called": true,
            "restore-list-name": "public country_country_id_seq postgres"
        },
        {
            "oid": 317782,
            "schema": "public",
            "name": "customer_customer_id_seq",
            "last-value": 599,
            "is-called": true,
            "restore-list-name": "public customer_customer_id_seq postgres"
        },
        {
            "oid": 317816,
            "schema": "public",
            "name": "film_film_id_seq",
            "last-value": 1000,
            "is-called": true,
            "restore-list-name": "public film_film_id_seq postgres"
        },
        {
            "oid": 317883,
            "schema": "public",
            "name": "inventory_inventory_id_seq",
            "last-value": 4581,
            "is-called": true,
            "restore-list-name": "public inventory_inventory_id_seq postgres"
        },
        {
            "oid": 317890,
            "schema": "public",
            "name": "language_language_id_seq",
            "last-value": 6,
            "is-called": true,
            "restore-list-name": "public language_language_id_seq postgres"
        },
        {
            "oid": 317902,
            "schema": "public",
            "name": "payment_payment_id_seq",
            "last-value": 32099,
            "is-called": true,
            "restore-list-name": "public payment_payment_id_seq postgres"
        },
        {
            "oid": 317932,
            "schema": "public",
            "name": "rental_rental_id_seq",
            "last-value": 16050,
            "is-called": true,
            "restore-list-name": "public rental_rental_id_seq postgres"
        },
        {
            "oid": 317944,
            "schema": "public",
            "name": "staff_staff_id_seq",
            "last-value": 2,
            "is-called": true,
            "restore-list-name": "public staff_staff_id_seq postgres"
        },
        {
            "oid": 317955,
            "schema": "public",
            "name": "store_store_id_seq",
            "last-value": 2,
            "is-called": true,
            "restore-list-name": "public store_store_id_seq postgres"
        }
    ]
}

Listing current progress (log lines removed):

$ pgcopydb list progress 2>/dev/null
             |  Total Count |  In Progress |         Done
-------------+--------------+--------------+-------------
      Tables |           21 |            4 |            7
     Indexes |           48 |           14 |            7

Listing current progress, in JSON:

$ pgcopydb list progress --json 2>/dev/null
{
    "table-jobs": 4,
    "index-jobs": 4,
    "tables": {
        "total": 21,
        "done": 9,
        "in-progress": [
            {
                "oid": 317908,
                "schema": "public",
                "name": "payment_p2020_01",
                "reltuples": 1157,
                "bytes": 98304,
                "bytes-pretty": "96 kB",
                "exclude-data": false,
                "restore-list-name": "public payment_p2020_01 postgres",
                "part-key": "",
                "process": {
                    "pid": 75159,
                    "start-time-epoch": 1662476249,
                    "start-time-string": "2022-09-06 16:57:29 CEST",
                    "command": "COPY \"public\".\"payment_p2020_01\""
                }
            },
            {
                "oid": 317855,
                "schema": "public",
                "name": "city",
                "reltuples": 600,
                "bytes": 73728,
                "bytes-pretty": "72 kB",
                "exclude-data": false,
                "restore-list-name": "public city postgres",
                "part-key": "city_id",
                "process": {
                    "pid": 75157,
                    "start-time-epoch": 1662476249,
                    "start-time-string": "2022-09-06 16:57:29 CEST",
                    "command": "COPY \"public\".\"city\""
                }
            }
        ]
    },
       "indexes": {
        "total": 48,
        "done": 39,
        "in-progress": [
            {
                "oid": 378283,
                "schema": "pgcopydb",
                "name": "sentinel_expr_idx",
                "isPrimary": false,
                "isUnique": true,
                "columns": "",
                "sql": "CREATE UNIQUE INDEX sentinel_expr_idx ON pgcopydb.sentinel USING btree ((1))",
                "restore-list-name": "pgcopydb sentinel_expr_idx dim",
                "table": {
                    "oid": 378280,
                    "schema": "pgcopydb",
                    "name": "sentinel"
                },
                "process": {
                    "pid": 74372,
                    "start-time-epoch": 1662476080,
                    "start-time-string": "2022-09-06 16:54:40 CEST"
                }
            },
            {
                "oid": 317980,
                "schema": "public",
                "name": "country_pkey",
                "isPrimary": true,
                "isUnique": true,
                "columns": "country_id",
                "sql": "CREATE UNIQUE INDEX country_pkey ON public.country USING btree (country_id)",
                "restore-list-name": "public country_pkey postgres",
                "table": {
                    "oid": 317865,
                    "schema": "public",
                    "name": "country"
                },
                "constraint": {
                    "oid": 317981,
                    "name": "country_pkey",
                    "sql": "PRIMARY KEY (country_id)",
                    "restore-list-name": ""
                },
                "process": {
                    "pid": 74358,
                    "start-time-epoch": 1662476080,
                    "start-time-string": "2022-09-06 16:54:40 CEST"
                }
            },
            {
                "oid": 317996,
                "schema": "public",
                "name": "staff_pkey",
                "isPrimary": true,
                "isUnique": true,
                "columns": "staff_id",
                "sql": "CREATE UNIQUE INDEX staff_pkey ON public.staff USING btree (staff_id)",
                "restore-list-name": "public staff_pkey postgres",
                "table": {
                    "oid": 317946,
                    "schema": "public",
                    "name": "staff"
                },
                "constraint": {
                    "oid": 317997,
                    "name": "staff_pkey",
                    "sql": "PRIMARY KEY (staff_id)",
                    "restore-list-name": ""
                },
                "process": {
                    "pid": 74368,
                    "start-time-epoch": 1662476080,
                    "start-time-string": "2022-09-06 16:54:40 CEST"
                }
            }
        ]
    }
}

pgcopydb stream

pgcopydb stream - Stream changes from source database

Warning

This mode of operations has been designed for unit testing only.

Consider using the pgcopydb clone (with the --follow option) or the pgcopydb follow command instead.

Note

Some pgcopydb stream commands are still designed for normal operations, rather than unit testing only.

The pgcopydb stream sentinel set startpos, pgcopydb stream sentinel set endpos, pgcopydb stream sentinel set apply, and pgcopydb stream sentinel set prefetch commands are necessary to communicate with the main pgcopydb clone --follow or pgcopydb follow process. See Change Data Capture Example 1 for a detailed example using pgcopydb stream sentinel set endpos.

Also the commands pgcopydb stream setup and pgcopydb stream cleanup might be used directly in normal operations. See Change Data Capture Example 2 for a detailed example.

This command prefixes the following sub-commands:

pgcopydb stream: Stream changes from the source database

Available commands:
  pgcopydb stream
    setup      Setup source and target systems for logical decoding
    cleanup    Cleanup source and target systems for logical decoding
    prefetch   Stream JSON changes from the source database and transform them to SQL
    catchup    Apply prefetched changes from SQL files to the target database
    replay     Replay changes from the source to the target database, live
  + sentinel   Maintain a sentinel table
    receive    Stream changes from the source database
    transform  Transform changes from the source database into SQL commands
    apply      Apply changes from the source database into the target database
pgcopydb stream sentinel: Maintain a sentinel table

Available commands:
  pgcopydb stream sentinel
    setup  Setup the sentinel table
    get    Get the sentinel table values
  + set    Set the sentinel table values
pgcopydb stream sentinel set: Set the sentinel table values

Available commands:
  pgcopydb stream sentinel set
    startpos  Set the sentinel start position LSN
    endpos    Set the sentinel end position LSN
    apply     Set the sentinel apply mode
    prefetch  Set the sentinel prefetch mode
pgcopydb stream sentinel setup: Setup the sentinel table
usage: pgcopydb stream sentinel setup <start lsn> <end lsn>

Those commands implement a part of the whole database replay operation as detailed in section pgcopydb follow. Only use those commands to debug a specific part, or because you know that you just want to implement that step.

Note

The sub-commands stream setup then stream prefetch and stream catchup are higher level commands, that use internal information to know which files to process. Those commands also keep track of their progress.

The sub-commands stream receive, stream transform, and stream apply are lower level interface that work on given files. Those commands still keep track of their progress, but have to be given more information to work.

pgcopydb stream setup

pgcopydb stream setup - Setup source and target systems for logical decoding

The command pgcopydb stream setup connects to the target database and creates a replication origin positioned at the LSN position of the logical decoding replication slot that must have been created already. See pgcopydb snapshot to create the replication slot and export a snapshot.

pgcopydb stream setup: Setup source and target systems for logical decoding
usage: pgcopydb stream setup

  --source                      Postgres URI to the source database
  --target                      Postgres URI to the target database
  --dir                         Work directory to use
  --restart                     Allow restarting when temp files exist already
  --resume                      Allow resuming operations after a failure
  --not-consistent              Allow taking a new snapshot on the source database
  --snapshot                    Use snapshot obtained with pg_export_snapshot
  --plugin                      Output plugin to use (test_decoding, wal2json)
  --wal2json-numeric-as-string  Print numeric data type as string when using wal2json output plugin
  --slot-name                   Stream changes recorded by this slot
  --origin                      Name of the Postgres replication origin

pgcopydb stream cleanup

pgcopydb stream cleanup - cleanup source and target systems for logical decoding

The command pgcopydb stream cleanup connects to the source and target databases to delete the objects created in the pgcopydb stream setup step.

pgcopydb stream cleanup: Cleanup source and target systems for logical decoding
usage: pgcopydb stream cleanup

  --source         Postgres URI to the source database
  --target         Postgres URI to the target database
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --snapshot       Use snapshot obtained with pg_export_snapshot
  --slot-name      Stream changes recorded by this slot
  --origin         Name of the Postgres replication origin

pgcopydb stream prefetch

pgcopydb stream prefetch - Stream JSON changes from the source database and transform them to SQL

The command pgcopydb stream prefetch connects to the source database using the logical replication protocl and the given replication slot.

The prefetch command receives the changes from the source database in a streaming fashion, and writes them in a series of JSON files named the same as their origin WAL filename (with the .json extension). Each time a JSON file is closed, a subprocess is started to transform the JSON into an SQL file.

pgcopydb stream prefetch: Stream JSON changes from the source database and transform them to SQL
usage: pgcopydb stream prefetch

  --source         Postgres URI to the source database
  --dir            Work directory to use
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --slot-name      Stream changes recorded by this slot
  --endpos         LSN position where to stop receiving changes

pgcopydb stream catchup

pgcopydb stream catchup - Apply prefetched changes from SQL files to the target database

The command pgcopydb stream catchup connects to the target database and applies changes from the SQL files that have been prepared with the pgcopydb stream prefetch command.

pgcopydb stream catchup: Apply prefetched changes from SQL files to the target database
usage: pgcopydb stream catchup

  --source         Postgres URI to the source database
  --target         Postgres URI to the target database
  --dir            Work directory to use
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --slot-name      Stream changes recorded by this slot
  --endpos         LSN position where to stop receiving changes
  --origin         Name of the Postgres replication origin

pgcopydb stream replay

pgcopydb stream replay - Replay changes from the source to the target database, live

The command pgcopydb stream replay connects to the source database and streams changes using the logical decoding protocol, and internally streams those changes to a transform process and then a replay process, which connects to the target database and applies SQL changes.

pgcopydb stream replay: Replay changes from the source to the target database, live
usage: pgcopydb stream replay

  --source         Postgres URI to the source database
  --target         Postgres URI to the target database
  --dir            Work directory to use
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --slot-name      Stream changes recorded by this slot
  --endpos         LSN position where to stop receiving changes
  --origin         Name of the Postgres replication origin

This command is equivalent to running the following script:

pgcopydb stream receive --to-stdout
| pgcopydb stream transform - -
| pgcopydb stream apply -

pgcopydb stream sentinel get

pgcopydb stream sentinel get - Get the sentinel table values

pgcopydb stream sentinel get: Get the sentinel table values
usage: pgcopydb stream sentinel get

  --json           Format the output using JSON
  --startpos       Get only the startpos value
  --endpos         Get only the endpos value
  --apply          Get only the apply value
  --write-lsn      Get only the write LSN value
  --flush-lsn      Get only the flush LSN value
  --replay-lsn     Get only the replay LSN value

pgcopydb stream sentinel set startpos

pgcopydb stream sentinel set startpos - Set the sentinel start position LSN

pgcopydb stream sentinel set startpos: Set the sentinel start position LSN
usage: pgcopydb stream sentinel set startpos <start lsn>

This is an advanced API used for unit-testing and debugging, the operation is automatically covered in normal pgcopydb operations.

Logical replication target system registers progress by assigning a current LSN to the --origin node name. When creating an origin on the target database system, it is required to provide the current LSN from the source database system, in order to properly bootstrap pgcopydb logical decoding.

pgcopydb stream sentinel set endpos

pgcopydb stream sentinel set endpos - Set the sentinel end position LSN

pgcopydb stream sentinel set endpos: Set the sentinel end position LSN
usage: pgcopydb stream sentinel set endpos [ --source ... ] [ <end lsn> | --current ]

  --source      Postgres URI to the source database
  --current     Use pg_current_wal_flush_lsn() as the endpos

Logical replication target LSN to use. Automatically stop replication and exit with normal exit status 0 when receiving reaches the specified LSN. If there’s a record with LSN exactly equal to lsn, the record will be output.

The --endpos option is not aware of transaction boundaries and may truncate output partway through a transaction. Any partially output transaction will not be consumed and will be replayed again when the slot is next read from. Individual messages are never truncated.

See also documentation for pg_recvlogical.

pgcopydb stream sentinel set apply

pgcopydb stream sentinel set apply - Set the sentinel apply mode

pgcopydb stream sentinel set apply: Set the sentinel apply mode
usage: pgcopydb stream sentinel set apply

pgcopydb stream sentinel set prefetch

pgcopydb stream sentinel set prefetch - Set the sentinel prefetch mode

pgcopydb stream sentinel set prefetch: Set the sentinel prefetch mode
usage: pgcopydb stream sentinel set prefetch

pgcopydb stream receive

pgcopydb stream receive - Stream changes from the source database

The command pgcopydb stream receive connects to the source database using the logical replication protocl and the given replication slot.

The receive command receives the changes from the source database in a streaming fashion, and writes them in a series of JSON files named the same as their origin WAL filename (with the .json extension).

pgcopydb stream receive: Stream changes from the source database
usage: pgcopydb stream receive

  --source         Postgres URI to the source database
  --dir            Work directory to use
  --to-stdout      Stream logical decoding messages to stdout
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --slot-name      Stream changes recorded by this slot
  --endpos         LSN position where to stop receiving changes

pgcopydb stream transform

pgcopydb stream transform - Transform changes from the source database into SQL commands

The command pgcopydb stream transform transforms a JSON file as received by the pgcopydb stream receive command into an SQL file with one query per line.

pgcopydb stream transform: Transform changes from the source database into SQL commands
usage: pgcopydb stream transform  <json filename> <sql filename>

  --target         Postgres URI to the target database
  --dir            Work directory to use
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database

The command supports using - as the filename for either the JSON input or the SQL output, or both. In that case reading from standard input and/or writing to standard output is implemented, in a streaming fashion. A classic use case is to use Unix Pipes, see pgcopydb stream replay too.

pgcopydb stream apply

pgcopydb stream apply - Apply changes from the source database into the target database

The command pgcopydb stream apply applies a SQL file as prepared by the pgcopydb stream transform command in the target database. The apply process tracks progress thanks to the Postgres API for Replication Progress Tracking.

pgcopydb stream apply: Apply changes from the source database into the target database
usage: pgcopydb stream apply  <sql filename>

  --target         Postgres URI to the target database
  --dir            Work directory to use
  --restart        Allow restarting when temp files exist already
  --resume         Allow resuming operations after a failure
  --not-consistent Allow taking a new snapshot on the source database
  --origin         Name of the Postgres replication origin

This command supports using - as the filename to read from, and in that case reads from the standard input in a streaming fashion instead.

Options

The following options are available to pgcopydb stream sub-commands:

--source

Connection string to the source Postgres instance. See the Postgres documentation for connection strings for the details. In short both the quoted form "host=... dbname=..." and the URI form postgres://user@host:5432/dbname are supported.

--target

Connection string to the target Postgres instance.

--dir

During its normal operations pgcopydb creates a lot of temporary files to track sub-processes progress. Temporary files are created in the directory specified by this option, or defaults to ${TMPDIR}/pgcopydb when the environment variable is set, or otherwise to /tmp/pgcopydb.

Change Data Capture files are stored in the cdc sub-directory of the --dir option when provided, otherwise see XDG_DATA_HOME environment variable below.

--restart

When running the pgcopydb command again, if the work directory already contains information from a previous run, then the command refuses to proceed and delete information that might be used for diagnostics and forensics.

In that case, the --restart option can be used to allow pgcopydb to delete traces from a previous run.

--resume

When the pgcopydb command was terminated before completion, either by an interrupt signal (such as C-c or SIGTERM) or because it crashed, it is possible to resume the database migration.

To be able to resume a streaming operation in a consistent way, all that’s required is re-using the same replication slot as in previous run(s).

--plugin

Logical decoding output plugin to use. The default is test_decoding which ships with Postgres core itself, so is probably already available on your source server.

It is possible to use wal2json instead. The support for wal2json is mostly historical in pgcopydb, it should not make a user visible difference whether you use the default test_decoding or wal2json.

--wal2json-numeric-as-string

When using the wal2json output plugin, it is possible to use the --wal2json-numeric-as-string option to instruct wal2json to output numeric values as strings and thus prevent some precision loss.

You need to have a wal2json plugin version on source database that supports --numeric-data-types-as-string option to use this option.

See also the documentation for wal2json regarding this option for details.

--slot-name

Logical decoding slot name to use.

--origin

Logical replication target system needs to track the transactions that have been applied already, so that in case we get disconnected or need to resume operations we can skip already replayed transaction.

Postgres uses a notion of an origin node name as documented in Replication Progress Tracking. This option allows to pick your own node name and defaults to “pgcopydb”. Picking a different name is useful in some advanced scenarios like migrating several sources in the same target, where each source should have their own unique origin node name.

--verbose

Increase current verbosity. The default level of verbosity is INFO. In ascending order pgcopydb knows about the following verbosity levels: FATAL, ERROR, WARN, INFO, NOTICE, DEBUG, TRACE.

--debug

Set current verbosity to DEBUG level.

--trace

Set current verbosity to TRACE level.

--quiet

Set current verbosity to ERROR level.

Environment

PGCOPYDB_SOURCE_PGURI

Connection string to the source Postgres instance. When --source is ommitted from the command line, then this environment variable is used.

PGCOPYDB_TARGET_PGURI

Connection string to the target Postgres instance. When --target is ommitted from the command line, then this environment variable is used.

PGCOPYDB_OUTPUT_PLUGIN

Logical decoding output plugin to use. When --plugin is omitted from the command line, then this environment variable is used.

PGCOPYDB_WAL2JSON_NUMERIC_AS_STRING

When true (or yes, or on, or 1, same input as a Postgres boolean) then pgcopydb uses the wal2json option --numeric-data-types-as-string when using the wal2json output plugin.

When --wal2json-numeric-as-string is ommitted from the command line then this environment variable is used.

TMPDIR

The pgcopydb command creates all its work files and directories in ${TMPDIR}/pgcopydb, and defaults to /tmp/pgcopydb.

XDG_DATA_HOME

The pgcopydb command creates Change Data Capture files in the standard place XDG_DATA_HOME, which defaults to ~/.local/share. See the XDG Base Directory Specification.

Examples

As an example here is the output generated from running the cdc test case, where a replication slot is created before the initial copy of the data, and then the following INSERT statement is executed:

begin;

with r as
 (
   insert into rental(rental_date, inventory_id, customer_id, staff_id, last_update)
        select '2022-06-01', 371, 291, 1, '2022-06-01'
     returning rental_id, customer_id, staff_id
 )
 insert into payment(customer_id, staff_id, rental_id, amount, payment_date)
       select customer_id, staff_id, rental_id, 5.99, '2020-06-01'
         from r;

 commit;

The command then looks like the following, where the --endpos has been extracted by calling the pg_current_wal_lsn() SQL function:

$ pgcopydb stream receive --slot-name test_slot --restart --endpos 0/236D668 -vv
16:01:57 157 INFO  Running pgcopydb version 0.7 from "/usr/local/bin/pgcopydb"
16:01:57 157 DEBUG copydb.c:406 Change Data Capture data is managed at "/var/lib/postgres/.local/share/pgcopydb"
16:01:57 157 INFO  copydb.c:73 Using work dir "/tmp/pgcopydb"
16:01:57 157 DEBUG pidfile.c:143 Failed to signal pid 34: No such process
16:01:57 157 DEBUG pidfile.c:146 Found a stale pidfile at "/tmp/pgcopydb/pgcopydb.pid"
16:01:57 157 INFO  pidfile.c:147 Removing the stale pid file "/tmp/pgcopydb/pgcopydb.pid"
16:01:57 157 INFO  copydb.c:254 Work directory "/tmp/pgcopydb" already exists
16:01:57 157 INFO  copydb.c:258 A previous run has run through completion
16:01:57 157 INFO  copydb.c:151 Removing directory "/tmp/pgcopydb"
16:01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb" && mkdir -p "/tmp/pgcopydb"
16:01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/schema" && mkdir -p "/tmp/pgcopydb/schema"
16:01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/run" && mkdir -p "/tmp/pgcopydb/run"
16:01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/run/tables" && mkdir -p "/tmp/pgcopydb/run/tables"
16:01:57 157 DEBUG copydb.c:445 rm -rf "/tmp/pgcopydb/run/indexes" && mkdir -p "/tmp/pgcopydb/run/indexes"
16:01:57 157 DEBUG copydb.c:445 rm -rf "/var/lib/postgres/.local/share/pgcopydb" && mkdir -p "/var/lib/postgres/.local/share/pgcopydb"
16:01:57 157 DEBUG pgsql.c:2476 starting log streaming at 0/0 (slot test_slot)
16:01:57 157 DEBUG pgsql.c:485 Connecting to [source] "postgres://postgres@source:/postgres?password=****&replication=database"
16:01:57 157 DEBUG pgsql.c:2009 IDENTIFY_SYSTEM: timeline 1, xlogpos 0/236D668, systemid 7104302452422938663
16:01:57 157 DEBUG pgsql.c:3188 RetrieveWalSegSize: 16777216
16:01:57 157 DEBUG pgsql.c:2547 streaming initiated
16:01:57 157 INFO  stream.c:237 Now streaming changes to "/var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json"
16:01:57 157 DEBUG stream.c:341 Received action B for XID 488 in LSN 0/236D638
16:01:57 157 DEBUG stream.c:341 Received action I for XID 488 in LSN 0/236D178
16:01:57 157 DEBUG stream.c:341 Received action I for XID 488 in LSN 0/236D308
16:01:57 157 DEBUG stream.c:341 Received action C for XID 488 in LSN 0/236D638
16:01:57 157 DEBUG pgsql.c:2867 pgsql_stream_logical: endpos reached at 0/236D668
16:01:57 157 DEBUG stream.c:382 Flushed up to 0/236D668 in file "/var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json"
16:01:57 157 INFO  pgsql.c:3030 Report write_lsn 0/236D668, flush_lsn 0/236D668
16:01:57 157 DEBUG pgsql.c:3107 end position 0/236D668 reached by WAL record at 0/236D668
16:01:57 157 DEBUG pgsql.c:408 Disconnecting from [source] "postgres://postgres@source:/postgres?password=****&replication=database"
16:01:57 157 DEBUG stream.c:414 streamClose: closing file "/var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json"
16:01:57 157 INFO  stream.c:171 Streaming is now finished after processing 4 messages

The JSON file then contains the following content, from the wal2json logical replication plugin. Note that you’re seeing diffent LSNs here because each run produces different ones, and the captures have not all been made from the same run.

$ cat /var/lib/postgres/.local/share/pgcopydb/000000010000000000000002.json
{"action":"B","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F5A8","nextlsn":"0/236F5D8"}
{"action":"I","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F0E8","schema":"public","table":"rental","columns":[{"name":"rental_id","type":"integer","value":16050},{"name":"rental_date","type":"timestamp with time zone","value":"2022-06-01 00:00:00+00"},{"name":"inventory_id","type":"integer","value":371},{"name":"customer_id","type":"integer","value":291},{"name":"return_date","type":"timestamp with time zone","value":null},{"name":"staff_id","type":"integer","value":1},{"name":"last_update","type":"timestamp with time zone","value":"2022-06-01 00:00:00+00"}]}
{"action":"I","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F278","schema":"public","table":"payment_p2020_06","columns":[{"name":"payment_id","type":"integer","value":32099},{"name":"customer_id","type":"integer","value":291},{"name":"staff_id","type":"integer","value":1},{"name":"rental_id","type":"integer","value":16050},{"name":"amount","type":"numeric(5,2)","value":5.99},{"name":"payment_date","type":"timestamp with time zone","value":"2020-06-01 00:00:00+00"}]}
{"action":"C","xid":489,"timestamp":"2022-06-27 13:24:31.460822+00","lsn":"0/236F5A8","nextlsn":"0/236F5D8"}

It’s then possible to transform the JSON into SQL:

$ pgcopydb stream transform  ./tests/cdc/000000010000000000000002.json /tmp/000000010000000000000002.sql

And the SQL file obtained looks like this:

$ cat /tmp/000000010000000000000002.sql
BEGIN; -- {"xid":489,"lsn":"0/236F5A8"}
INSERT INTO "public"."rental" (rental_id, rental_date, inventory_id, customer_id, return_date, staff_id, last_update) VALUES (16050, '2022-06-01 00:00:00+00', 371, 291, NULL, 1, '2022-06-01 00:00:00+00');
INSERT INTO "public"."payment_p2020_06" (payment_id, customer_id, staff_id, rental_id, amount, payment_date) VALUES (32099, 291, 1, 16050, 5.99, '2020-06-01 00:00:00+00');
COMMIT; -- {"xid": 489,"lsn":"0/236F5A8"}

pgcopydb configuration

Manual page for the configuration of pgcopydb. The pgcopydb command accepts sub-commands and command line options, see the manual for those commands for details. The only setup that pgcopydb commands accept is the filtering.

Filtering

Filtering allows to skip some object definitions and data when copying from the source to the target database. The pgcopydb commands that accept the option --filter (or --filters) expect an existing filename as the option argument. The given filename is read in the INI file format, but only uses sections and option keys. Option values are not used.

Here is an inclusion based filter configuration example:

[include-only-table]
public.allcols
public.csv
public.serial
public.xzero

[exclude-index]
public.foo_gin_tsvector

[exclude-table-data]
public.csv

Here is an exclusion based filter configuration example:

[exclude-schema]
foo
bar
expected

[exclude-table]
"schema"."name"
schema.othername
err.errors
public.serial

[exclude-index]
schema.indexname

[exclude-table-data]
public.bar
nsitra.test1

Filtering can be done with pgcopydb by using the following rules, which are also the name of the sections of the INI file.

include-only-table

This section allows listing the exclusive list of the source tables to copy to the target database. No other table will be processed by pgcopydb.

Each line in that section should be a schema-qualified table name. Postgres identifier quoting rules can be used to avoid ambiguity.

When the section include-only-table is used in the filtering configuration then the sections exclude-schema and exclude-table are disallowed. We would not know how to handle tables that exist on the source database and are not part of any filter.

NOTE: Materialized views are also considered as tables during the filtering.

exclude-schema

This section allows adding schemas (Postgres namespaces) to the exclusion filters. All the tables that belong to any listed schema in this section are going to be ignored by the pgcopydb command.

This section is not allowed when the section include-only-table is used.

include-only-schema

This section allows editing schema (Postgres namespaces) to the exclusion filters by listing the schema that are not going to be excluded. This is a syntaxic sugar facility that helps with entering a long list of schemas to exclude when a single schema is to be selected.

Despite the name, this section is an exclusion filter.

This section is not allowed when the section exclude-schema is used.

exclude-table

This section allows to add a list of qualified table names to the exclusion filters. All the tables that are listed in the exclude-table section are going to be ignored by the pgcopydb command.

This section is not allowed when the section include-only-table is used.

NOTE: Materialized views are also considered as tables during the filtering.

exclude-index

This section allows to add a list of qualified index names to the exclusion filters. It is then possible for pgcopydb to operate on a table and skip a single index definition that belong to a table that is still processed.

exclude-table-data

This section allows to skip copying the data from a list of qualified table names. The schema, index, constraints, etc of the table are still copied over.

NOTE: Materialized views are also considered as tables during the filtering.

Reviewing and Debugging the filters

Filtering a pg_restore archive file is done through rewriting the archive catalog obtained with pg_restore --list. That’s a little hackish at times, and we also have to deal with dependencies in pgcopydb itself.

The following commands can be used to explore a set of filtering rules: