I.3. pg_anon#

I.3. pg_anon

I.3. pg_anon #

pg_anon — anonymization tool for Tantor SE

I.3.1. Overview #

Version: 1.9.1

Download from nexus-public

pg_anon helps you safely clone your production database for testing or development. During the process, all sensitive fields are replaced with realistic but fake values — keeping your data structure intact and your privacy protected.

I.3.2. Requirements #

  • Python: 3.11+

  • Tantor SE client utilities (must match the server’s major version):

    • pg_dump – uses for export the database schema

    • pg_restore – uses for restore the database schema into the target database

For details, see: Installation and configuring

I.3.3. Terminology #

Term Description
Personal (sensitive) data Data that must not be shared with third parties. Includes personal or confidential business information.
Source database The original database that contains sensitive data.
Target database An empty database where anonymized data will be restored.
Meta-dictionary A Python file describing rules for detecting sensitive data. Created manually and used as the basis for generating the sensitive dictionary during scanning. See more
Prepared sensitive dictionary A Python file that defines which tables and fields contain sensitive data and how to anonymize them. Created automatically or manually. See more
Prepared non-sensitive dictionary A Python file listing schemas, tables, and fields without sensitive data. Used to speed up repeated scans. See more
Table dictionary A Python file listing tables. Used to include or exclude tables from dump & restore operations. See more
Create-dict (scan) The process of scanning the source database to detect sensitive fields and create dictionary files. See more
Dump Exporting data from the source database into files using a dictionary. This is where anonymization occurs. See more
Restore Importing anonymized data from files into the target database. See more
Anonymization (masking) Full process of cloning and sanitizing data (dump → restore), replacing sensitive values with random or hashed ones.
Anonymization function A Tantor SE function (built-in or from anon_funcs schema) that replaces sensitive values with random or hashed data. New functions can be added to extend anonymization logic.

I.3.4. Quick Start #

I.3.4.1. Before you start #

In this guide, a privileged user will be created and test databases with data will be set up.

It is recommended to follow this quick start guide in a non-production environment.

I.3.4.1.1. Prerequisites #
  • A working Tantor SE instance

  • PostgreSQL client utilities installed

Note

If running as not superuser, ensure you can install **pgcrypto** extension in your quickstart source database.

I.3.4.2. Preparing pg_anon #

git clone https://github.com/TantorLabs/pg_anon.git pg_anon
cd pg_anon
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
python3 -m pg_anon --version

I.3.4.3. Preparing DB environment #

Create a DB user for the quick start guide

sudo su - postgres -c "psql -p 5432 -U postgres -c \"CREATE USER anon_test_user WITH PASSWORD 'mYy5RexGsZ' SUPERUSER;\""

Prepare the SQL script to initialize the databases

cat > /tmp/db_env.sql << 'EOL'
DROP DATABASE IF EXISTS pg_anon_quick_start_source_db;
CREATE DATABASE pg_anon_quick_start_source_db
WITH
OWNER = anon_test_user
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
template = template0;

DROP DATABASE IF EXISTS pg_anon_quick_start_target_db;
CREATE DATABASE pg_anon_quick_start_target_db
WITH
OWNER = anon_test_user
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
template = template0;
EOL

Initialize source and target databases

sudo chown postgres:postgres /tmp/db_env.sql
sudo su - postgres -c "psql -p 5432 -U postgres -f /tmp/db_env.sql"

Load test environment into source DB

cp $(pwd)/tests/sql/init_env.sql /tmp/init_env.sql
sudo chown postgres:postgres /tmp/init_env.sql
sudo su - postgres -c "psql -p 5432 -d pg_anon_quick_start_source_db -U postgres -f /tmp/init_env.sql"

I.3.4.4. Initializing the service schema for pg_anon #

python3 -m pg_anon init \
    --db-user=anon_test_user \
    --db-user-password=mYy5RexGsZ \
    --db-name=pg_anon_quick_start_source_db \
    --db-host=127.0.0.1 \
    --db-port=5432

I.3.4.5. Scan your source database #

python3 -m pg_anon create-dict \
    --db-user=anon_test_user \
    --db-user-password=mYy5RexGsZ \
    --db-name=pg_anon_quick_start_source_db \
    --db-host=127.0.0.1 \
    --db-port=5432 \
    --meta-dict-file=tests/input_dict/test_meta_dict.py \
    --output-sens-dict-file=test_sens_dict_output.py \
    --output-no-sens-dict-file=test_no_sens_dict_output.py \
    --processes=2

I.3.4.6. Visualizing anonymization rules #

Run pg_anon in view-fields mode to see which fields will be anonymized and which fields will be dumped as-is.

python3 -m pg_anon view-fields \
    --db-host=127.0.0.1 \
    --db-user=anon_test_user \
    --db-user-password=mYy5RexGsZ \
    --db-name=pg_anon_quick_start_source_db \
    --db-port=5432 \
    --prepared-sens-dict-file=test_sens_dict_output.py \
    --fields-count=20

Run pg_anon in view-data mode to preview anonymized data in a specific table.

python3 -m pg_anon view-data \
    --db-host=127.0.0.1 \
    --db-user=anon_test_user \
    --db-user-password=mYy5RexGsZ \
    --db-name=pg_anon_quick_start_source_db \
    --db-port=5432 \
    --prepared-sens-dict-file=test_sens_dict_output.py \
    --schema-name=public \
    --table-name=contracts \
    --limit=10 \
    --offset=0

I.3.4.7. Create an anonymized backup #

python3 -m pg_anon dump \
    --db-host=127.0.0.1 \
    --db-user=anon_test_user \
    --db-user-password=mYy5RexGsZ \
    --db-name=pg_anon_quick_start_source_db \
    --db-port=5432 \
    --prepared-sens-dict-file=test_sens_dict_output.py \
    --output-dir=/tmp/quick_start_dump \
    --clear-output-dir

I.3.4.8. Restore the anonymized backup into the target DB #

python3 -m pg_anon restore \
    --db-host=127.0.0.1 \
    --db-port=5432 \
    --db-user=anon_test_user \
    --db-user-password=mYy5RexGsZ \
    --db-name=pg_anon_quick_start_target_db \
    --input-dir=/tmp/quick_start_dump \
    --drop-custom-check-constr \
    --verbose=debug

I.3.5. Installation & Configuration #

You can use several options for installing pg_anon:

I.3.5.1. Before you install #

pg_anon provides 2 ways to run: CLI and REST API

The REST API service is optional to install. This service is designed to integrate pg_anon functionality into any system or pipelines via HTTP requests. It works just as a thin wrapper around the CLI version of pg_anon. REST API calls prepare CLI parameters and run the CLI version of pg_anon in the background.

It doesn’t keep state or store data in a database, so it can be scaled easily without extra setup.

However, this means that the system that integrates pg_anon must implement its own storage for dictionaries, dump tasks, and restore tasks.

Note

Not suitable for fully autonomous operation.

All operation runs logs and info will be stored in the directory /path_to_pg_anon/runs. All dumps will be stored in the directory /path_to_pg_anon/output. If the REST API service is scaled, you must create a symlink to this directory on a shared disk. This is required because restore operations also read dumps from /path_to_pg_anon/output.

I.3.5.2. Using db_extension_installer.sh locally from the downloaded package #

  1. Download the pg_anon package suitable for your OS from nexus-public.

  2. Download the installation script db_extension_installer.sh and make it executable:

    wget https://public.tantorlabs.ru/db_extension_installer.sh && \
    chmod +x db_extension_installer.sh
    
  3. Run the pg_anon installation using db_extension_installer.sh from the downloaded package:

    ./db_extension_installer.sh \
      --from-file=<path to pg_anon package>
    

I.3.5.3. Using db_extension_installer.sh automatically from the package repository #

  1. Download the installation script db_extension_installer.sh and make it executable:

    wget https://public.tantorlabs.ru/db_extension_installer.sh && \
    chmod +x db_extension_installer.sh
    
  2. Set the value of the environment variable NEXUS_URL:

    export NEXUS_URL="nexus-public.tantorlabs.ru"
    
  3. Run the pg_anon installation using db_extension_installer.sh with the following parameters:

    ./db_extension_installer.sh \
      --database-type=tantor \
      --edition=all \
      --extension=pg_anon
    

I.3.5.4. From the GitHub repository (Python) #

  1. Install Python 3 if it is not installed: sudo apt-get install python3.11 (for Ubuntu)

  2. Clone the repository: git clone https://github.com/TantorLabs/pg_anon.git

  3. Go to the project directory: cd pg_anon

  4. Set up a virtual environment:

    • Install the virtual environment: python3 -m venv venv

    • Activate the virtual environment: source venv/bin/activate

  5. Install the dependencies: pip install -r requirements.txt

  6. (Optional) For use the REST API service, install its dependencies: pip install -r rest_api/requirements.txt

I.3.5.5. Configuring pg_anon #

To specify custom pg_dump and pg_restore utilities, use the --pg-dump and --pg-restore parameters.

Advanced configuration is also available:

  • CLI - use run parameter --config

  • REST API - config must be placed at /path_to_pg_anon/config.yml

pg-utils-versions:
  <postgres_major_version>:
    pg_dump: "/path/to/<postgres_major_version>/pg_dump"
    pg_restore: "/path/to/<postgres_major_version>/pg_restore"
  <another_postgres_major_version>:
    pg_dump: "/path/to/<postgres_major_version>/pg_dump"
    pg_restore: "/path/to/<postgres_major_version>/pg_restore"
  default:
    pg_dump: "/path/to/default_postgres_version/pg_dump"
    pg_restore: "/path/to/default_postgres_version/pg_restore"

For example, you can specify a configuration for Tantor SE 15 and 17:

pg-utils-versions:
  15:
    pg_dump: "/opt/tantor/db/15/bin/pg_dump"
    pg_restore: "/opt/tantor/db/15/bin/pg_restore"
  17:
    pg_dump: "/opt/tantor/db/17/bin/pg_dump"
    pg_restore: "/opt/tantor/db/17/bin/pg_restore"
  default:
    pg_dump: "/opt/tantor/db/17/bin/pg_dump"
    pg_restore: "/opt/tantor/db/17/bin/pg_restore"

If the current Tantor SE version does not match any version in this config, the utilities from the default section will be used. For example, pg_anon can be run with this config on Tantor SE 16. In this case, pg_dump 17 and pg_restore 17 will be used.

I.3.5.6. Running REST API #

Run service command:

python -m uvicorn rest_api.api:app --host 0.0.0.0 --port 8000 --workers=3
  • Recommended worker count = 2 * CPU_CORES + 1

  • Service OpenAPI documentation will be able by address - http://0.0.0.0:8000/docs#/

  • Also you can see API documentation

I.3.6. How it works #

I.3.6.1. Anonymization (masking) #

The diagram below illustrates how data is transferred from the source DB to the target DB.

The source database contains sensitive information and is typically located in a production environment with strictly limited access.

Figure I.1. Dump-Resore-Terms.drawio.png

Dump-Resore-Terms.drawio.png

A trusted administrator runs pg_anon with credentials for the source DB. Using the prepared and approved sensitive dictionary, pg_anon creates an anonymized dump in the specified directory. The dictionary must be created in advance and validated by the security team.

The resulting dump directory is then transferred to the host of the target database. Compression during transfer is unnecessary because the dump files are already compressed.

Once the directory is placed on the target host, the restore process is started using target database credentials. The target database must be created beforehand using CREATE DATABASE and must be empty.

After a successful restore, the anonymized database is ready for development or testing. Any number of employees can safely use it without risking exposure of sensitive data.

I.3.6.2. What kind of work does pg_anon do inside during dump and restore? The simplest representation. #

I.3.6.2.1. For example, we have data that we want to anonymize: #
  1. Create the source table:

create table users (
    id bigserial,
    email text,
    login text
);

-- Checking the contents of the source table
select * from users;
>>
    id |  email  | login 
   ----+---------+-------
  1. Populating the source table:

insert into users (email, login)
select
 'user' || generate_series(1001, 1020) || '@example.com',
 'user' || generate_series(1001, 1020);

-- Checking the contents of the source table
select * from users;
>>
    id |    email         |  login   
   ----+----------------------+----------
     1 | user1001@example.com | user1001
     2 | user1002@example.com | user1002
    ...

The ‘email’ field contains sensitive data. We need to anonymize it.

I.3.6.2.2. What is the process of creating a dump with masking? #
  1. Data dump from the source table to a CSV file (without masking):

copy (
    select *
    from users
) to '/tmp/users.csv' with csv;
cat /tmp/users.csv
>>
   1,user1001@example.com,user1001
   2,user1002@example.com,user1002
   ...
  1. Masking the contents of the source table:

select
   id,
   md5(email) || '@abc.com' as email, -- hashing the email (masking rule in prepared sens dict file)
   login
from users;
>>
    id |                email                     |  login   
   ----+------------------------------------------+----------
     1 | 385513d80895c4c5e19c91d1df9eacae@abc.com | user1001
     2 | 9f4c0c30f85b0353c4d5fe3c9cc633e3@abc.com | user1002
    ...
  1. Data dump from the source table to a CSV file (with masking):

copy (
  select
    id,
    md5(email) || '@abc.com' as email, -- hashing the email (masking rule in prepared sens dict file)
    login
  from users
) to '/tmp/users_anonymized.csv' with csv;
cat /tmp/users_anonymized.csv
>>
   1,385513d80895c4c5e19c91d1df9eacae@abc.com,user1001
   2,9f4c0c30f85b0353c4d5fe3c9cc633e3@abc.com,user1002
   ...

The prepared sens dict file contains masking rules like hashing

I.3.6.2.3. What is the process for restoring a masked dump? #
  1. Reproducing of the structure. Creating the target table:

create table users_anonymized (
    id bigserial,
    email text,
    login text
);

-- Checking the contents of the target table
select * from users_anonymized;
>>
    id |  email  | login 
   ----+---------+-------
  1. Loading data from the source table data dump (CSV file) to target table:

copy users_anonymized
from '/tmp/users_anonymized.csv'
with csv;

-- Checking the contents of the target table
select * from users_anonymized;
>>
    id |                email                     |  login   
   ----+------------------------------------------+----------
     1 | 385513d80895c4c5e19c91d1df9eacae@abc.com | user1001
     2 | 9f4c0c30f85b0353c4d5fe3c9cc633e3@abc.com | user1002
    ...
I.3.6.2.4. Differences between original work of pg_anon and that representation #
  • pg_anon operates on the entire database (not only one table)

  • pg_anon uses .bin.gz files to save data (not csv)

  • Masking rules are provided to pg_anon via a prepared sens dict file

I.3.7. Operations #

I.3.7.1. Init #

This mode creates the anon_funcs schema in the source database and loads the predefined SQL functions from init.sql. These functions are required for processing data in the source database.

I.3.7.1.1. Run example #
python -m pg_anon init \
                  --db-user=postgres \
                  --db-user-password=postgres \
                  --db-name=source_db
I.3.7.1.2. Options #

Common pg_anon options

Option Required Description
--verbose No Sets the log verbosity level: info, debug, error. (default: info)
--debug No Enables debug mode (equivalent to --verbose=debug) and adds extra debug logs. (default: false)

Database configuration options

Option Required Description
--db-host Yes Database host.
--db-port No Database port.
--db-name Yes Database name.
--db-user Yes Database user.
--db-user-password No Database user password.
--db-passfile No Path to a file containing the password used for authentication.
--db-ssl-key-file No Path to the client SSL key file for secure connections.
--db-ssl-cert-file No Path to the client SSL certificate file.
--db-ssl-ca-file No Path to the CA certificate used to verify the server’s certificate.

I.3.7.2. Scan #

The scan operation analyzes your Tantor SE database to detect potentially sensitive data and generate dictionaries files. It used for dump and repeat scan.

I.3.7.2.1. Prerequisites #
  • Manually created meta-dictionary

  • Already run init mode for source database

I.3.7.2.2. Usage #

To scan source database and create dictionary for dump, run pg_anon in create-dict mode. You need:

  • meta-dictionary file with scan rules.

python pg_anon.py create-dict \
                  --db-user=postgres \
                  --db-user-password=postgres \
                  --db-name=test_source_db \
                  --meta-dict-file=test_meta_dict.py \
                  --prepared-sens-dict-file=test_sens_dict_output_previous_use.py \
                  --prepared-no-sens-dict-file=test_no_sens_dict_output_previous_use.py \
                  --output-sens-dict-file=test_sens_dict_output.py \
                  --output-no-sens-dict-file=test_no_sens_dict_output.py \
                  --processes=2
I.3.7.2.3. Options #

Common pg_anon options

Option Required Description
--config No Path to the config file that can specify pg_dump and pg_restore utilities. (default: none)
--processes No Number of processes used for multiprocessing operations. (default: 4)
--db-connections-per-process No Number of database connections per process for I/O operations. (default: 4)
--verbose No Sets the log verbosity level: info, debug, error. (default: info)
--debug No Enables debug mode (equivalent to --verbose=debug) and adds extra debug logs. (default: false)

Database configuration options

Option Required Description
--db-host Yes Database host.
--db-port No Database port.
--db-name Yes Database name.
--db-user Yes Database user.
--db-user-password No Database user password.
--db-passfile No Path to a file containing the password used for authentication.
--db-ssl-key-file No Path to the client SSL key file for secure connections.
--db-ssl-cert-file No Path to the client SSL certificate file.
--db-ssl-ca-file No Path to the CA certificate used to verify the server’s certificate.

Create-dict (scan) mode options

Option Required Description
--meta-dict-file Yes Input file or file list contains meta-dictionary, which was prepared manually. In rules collision case, priority has rules in last file from the list
--prepared-sens-dict-file No Input file or file list contains sensitive dictionary, which was obtained in previous use by option --output-sens-dict-file or prepared manually. In rules collision case, priority has rules in last file from the list
--prepared-no-sens-dict-file No Input file or file list contains not sensitive dictionary, which was obtained in previous use by option --output-no-sens-dict-file or prepared manually. In rules collision case, priority has rules in last file from the list
--output-sens-dict-file Yes Output file path for saving sensitive dictionary
--output-no-sens-dict-file No Output file path for saving not sensitive dictionary
--scan-mode No Defines whether to scan all data or only part of it [full, partial] (default partial)
--scan-partial-rows No In --scan-mode partial defines amount of rows to scan (default 10000). Actual rows count count can be smaller after getting unique values
--save-dicts No Duplicate all input and output dictionaries to dir runs. It can be useful for debugging or integration purposes.

I.3.7.3. Dump #

This mode creates an anonymized backup using rules from the sensitive dictionary.

Note

This backup can only be restored using pg_anon and cannot be restored with pg_restore

I.3.7.3.1. Prerequisites #
  • The anon_funcs schema with anonymization functions must already exist. See init mode.

  • A sensitive dictionary containing data about database fields and their anonymization rules must be prepared beforehand. See create-dict (scan) mode.

I.3.7.3.2. Full dump (dump) mode #

Creates a backup containing both the database structure and anonymized data.

This backup can be restored using the following modes:

Run example

python pg_anon.py dump \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --prepared-sens-dict-file=sens_dict.py
I.3.7.3.3. Structure dump (sync-struct-dump) mode #

Creates a backup containing only the database structure without anonymized data.

This backup can be restored in this mode:

This mode is useful when used together with the data dump (sync-data-dump) mode.

Run example

python pg_anon.py sync-struct-dump \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --output-dir=test_sync_struct_dump \
                 --prepared-sens-dict-file=sens_dict.py
I.3.7.3.4. Data dump (sync-data-dump) mode #

Create backup contains only anonymized data without database structure.

This backup can be restored in this mode:

This mode can be useful for scheduling database synchronization, for example using cron.

Run example

python pg_anon.py sync-data-dump \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --output-dir=test_sync_data_dump \
                 --prepared-sens-dict-file=sens_dict.py
I.3.7.3.5. Create partial dump #

Partial dumps are used to create a backup excluding certain tables from the source database.

Partial dump can be run in all dump modes:

Partial dumps use a tables dictionary containing a list of tables. This dictionary can act as either a whitelist or a blacklist. See tables dictionary.

Run example

Dump only need tables (whitelist)

python pg_anon.py dump \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --output-dir=partial_dump_white_list \
                 --prepared-sens-dict-file=sens_dict.py
                 --partial-tables-dict-file=include_tables.py

Dump all tables without some specified tables (blacklist)

python pg_anon.py dump \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --output-dir=partial_dump_black_list \
                 --prepared-sens-dict-file=sens_dict.py
                 --partial-tables-exclude-dict-file=exclude_tables.py

Dump only specified tables with excluding some of them (whitelist + blacklist)

python pg_anon.py dump \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --output-dir=partial_dump_white_list_and_black_list \
                 --prepared-sens-dict-file=sens_dict.py
                 --partial-tables-dict-file=include_tables.py
                 --partial-tables-exclude-dict-file=exclude_tables.py
I.3.7.3.6. Options #

Common pg_anon options

Option Required Description
--config No Path to the config file that can specify pg_dump and pg_restore utilities. (default: none)
--db-connections-per-process No Number of database connections per process for I/O operations. (default: 4)
--verbose No Sets the log verbosity level: info, debug, error. (default: info)
--debug No Enables debug mode (equivalent to --verbose=debug) and adds extra debug logs. (default: false)

Database configuration options

Option Required Description
--db-host Yes Database host.
--db-port No Database port.
--db-name Yes Database name.
--db-user Yes Database user.
--db-user-password No Database user password.
--db-passfile No Path to a file containing the password used for authentication.
--db-ssl-key-file No Path to the client SSL key file for secure connections.
--db-ssl-cert-file No Path to the client SSL certificate file.
--db-ssl-ca-file No Path to the CA certificate used to verify the server’s certificate.

Dump mode options

Option Required Description
--prepared-sens-dict-file Yes Input file or file list contains sensitive dictionary, which was generated by the create-dict (scan) mode or created manually. In rules collision case, priority has rules in last file from the list
--partial-tables-dict-file No Input file or file list contains tables dictionary for include specific tables in the dump. All tables not listed in these files will be excluded. These files must be prepared manually (acts as a whitelist)
--partial-tables-exclude-dict-file No Input file or file list contains tables dictionary for exclude specific tables from the dump. All tables listed in these files will be excluded. These files must be prepared manually (acts as a blacklist)
--dbg-stage-1-validate-dict No Validate dictionary, show the tables and run SQL queries without data export (default: false)
--dbg-stage-2-validate-data No Validate data, show the tables and run SQL queries with data export in prepared database (default: false)
--dbg-stage-3-validate-full No Makes all logic with limit in SQL queries (default: false)
--clear-output-dir No Clears the output directory from previous dumps or other files. (default: false)
--pg-dump No Path to the pg_dump Postgres tool (default /usr/bin/pg_dump).
--pg-dump-optionsNoAdditional options passed directly to pg_dump utility. Example: "--no-comments --encoding=LATIN1".
--output-dir No Output directory for dump files. (default "")
--ignore-privileges No Ignore privileges from source db.
--save-dicts No Duplicate all input dictionaries to dir runs. It can be useful for debugging or integration purposes.

I.3.7.4. Restore #

This mode restores an anonymized backup created using pg_anon in the dump mode.

Note

Only backups created with pg_anon can be restored. Backups created with pg_dump cannot be restored.

I.3.7.4.1. Full restore (restore) mode #

Restores both the database structure and data.

Prerequisites

Run example

python pg_anon.py restore \
                  --db-host=127.0.0.1 \
                  --db-user=postgres \
                  --db-user-password=postgres \
                  --db-name=target_db \
                  --input-dir=path/to/my_full_dump \
                  --verbose=debug
I.3.7.4.2. Structure restore (sync-struct-restore) mode #

Restores only the database structure.

Prerequisites

Run example

python pg_anon.py sync-struct-restore \
                  --db-host=127.0.0.1 \
                  --db-user=postgres \
                  --db-user-password=postgres \
                  --db-name=target_db \
                  --input-dir=path/to/my_sync_struct_dump \
                  --verbose=debug
I.3.7.4.3. Data restore (sync-data-restore) mode #

Restores data only.

Prerequisites

Run example

python pg_anon.py sync-data-restore \
                  --db-host=127.0.0.1 \
                  --db-user=postgres \
                  --db-user-password=postgres \
                  --db-name=target_db \
                  --input-dir=path/to/my_sync_data_dump \
                  --verbose=debug
I.3.7.4.4. Create partial restore #

Partial restores are used to restore only part of a backup, excluding certain tables if needed.

Partial restore can be run in all restore modes:

Partial restores use a tables dictionary containing a list of tables. This dictionary can act as either a whitelist (only listed tables are restored) or a blacklist (listed tables are excluded). See tables dictionary.

Run example

Restore only need tables (whitelist)

python pg_anon.py restore \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --input-dir=partial_dump_white_list \
                 --partial-tables-dict-file=include_tables.py

Dump all tables without some specified tables (blacklist)

python pg_anon.py restore \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --input-dir=partial_dump_black_list \
                 --partial-tables-exclude-dict-file=exclude_tables.py

Dump only specified tables with excluding some of them (whitelist + blacklist)

python pg_anon.py restore \
                 --db-host=127.0.0.1 \
                 --db-user=postgres \
                 --db-user-password=postgres \
                 --db-name=source_db \
                 --input-dir=partial_dump_white_list_and_black_list \
                 --partial-tables-dict-file=include_tables.py
                 --partial-tables-exclude-dict-file=exclude_tables.py
I.3.7.4.5. Options #

Common pg_anon options

Option Required Description
--config No Path to the config file that can specify pg_dump and pg_restore utilities. (default: none)
--processes No Number of processes used for multiprocessing operations. (default: 4)
--db-connections-per-process No Number of database connections per process for I/O operations. (default: 4)
--verbose No Sets the log verbosity level: info, debug, error. (default: info)
--debug No Enables debug mode (equivalent to --verbose=debug) and adds extra debug logs. (default: false)

Database configuration options

Option Required Description
--db-host Yes Database host.
--db-port No Database port.
--db-name Yes Database name.
--db-user Yes Database user.
--db-user-password No Database user password.
--db-passfile No Path to a file containing the password used for authentication.
--db-ssl-key-file No Path to the client SSL key file for secure connections.
--db-ssl-cert-file No Path to the client SSL certificate file.
--db-ssl-ca-file No Path to the CA certificate used to verify the server’s certificate.

Restore mode options

Option Required Description
--input-dir Yes Path to the directory containing dump files created in dump mode.
--partial-tables-dict-file No Input Input file or file list contains tables dictionary for include specific tables in the dump. All tables not listed in these files will be excluded. These files must be prepared manually (acts as a whitelist).
--partial-tables-exclude-dict-file No Input Input file or file list contains tables dictionary for exclude specific tables from the dump. All tables listed in these files will be excluded. These files must be prepared manually (acts as a blacklist)
--disable-checks No Disable checks of disk space and Tantor SE version (default false)
--seq-init-by-max-value No Initialize sequences based on maximum values. Otherwise, the sequences will be initialized based on the values of the source database.
--drop-custom-check-constr No Drops all CHECK constraints that contain user-defined procedures to avoid performance degradation during data loading.
--pg-restore No Path to the pg_restore Postgres tool.
--pg-restore-optionsNoAdditional options passed directly to pg_restore utility. Example: "--no-comments --no-table-access-method".
--clean-db No Cleans the database objects before restoring (if they exist in the dump). Mutually exclusive with --drop-db.
--drop-db No Drop target database before restore. Mutually exclusive with --clean-db.
--ignore-privileges No Ignore privileges from source db.
--save-dicts No Duplicate all input dictionaries to dir runs. It can be useful for debugging or integration purposes.

I.3.7.5. View Fields #

This mode displays how database fields match the anonymization rules.

I.3.7.5.1. Prerequisites #
  • The anon_funcs schema with anonymization functions must already exist. See init mode.

  • A sensitive dictionary containing data about database fields and their anonymization rules must be prepared beforehand. See create-dict (scan) mode.

I.3.7.5.2. Run example #
   python pg_anon.py view-fields \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=source_db \
                     --prepared-sens-dict-file=sens_dict.py

Note

This mode can process only a limited number of fields when no filters are applied, for performance reasons.

This limit is controlled by the --fields-count option (default: 5000 fields). To avoid hitting this limit, increase the --fields-count value or use filter options: --schema-name, --schema-mask, --table-name, --table-mask.

I.3.7.5.3. Options #

Common pg_anon options

Option Required Description
--config No Path to the config file that can specify pg_dump and pg_restore utilities. (default: none)
--processes No Number of processes used for multiprocessing operations. (default: 4)
--db-connections-per-process No Number of database connections per process for I/O operations. (default: 4)
--verbose No Sets the log verbosity level: info, debug, error. (default: info)
--debug No Enables debug mode (equivalent to --verbose=debug) and adds extra debug logs. (default: false)

Database configuration options

OptionRequiredDescription
--db-hostYesDatabase host.
--db-portNoDatabase port.
--db-nameYesDatabase name.
--db-userYesDatabase user.
--db-user-passwordNoDatabase user password.
--db-passfileNoPath to a file containing the password used for authentication.
--db-ssl-key-fileNoPath to the client SSL key file for secure connections.
--db-ssl-cert-fileNoPath to the client SSL certificate file.
--db-ssl-ca-fileNoPath to the CA certificate used to verify the server’s certificate.

View-fields mode options

Option Required Description
--prepared-sens-dict-file Yes Path to Input file or file list contains sensitive dictionary, which was generated by the create-dict (scan) mode or created manually. In rules collision case, priority has rules in last file from the list.
--view-only-sensitive-fields No Displays only sensitive fields. (default: all fields)
--fields-count No Maximum number of fields to process for output (default: 5000)
--schema-name No Filter by schema name
--schema-mask No Filter by schema name using a regular expression
--table-name No Filter by table name
--table-mask No Filter by table name using a regular expression
--json No Outputs results in JSON format instead of a table.

I.3.7.6. View Data #

This mode displays anonymized table data without creating a dump.

I.3.7.6.1. Prerequisites #
  • The anon_funcs schema with anonymization functions must already exist. See init mode.

  • A sensitive dictionary containing data about database fields and their anonymization rules must be prepared beforehand. See create-dict (scan) mode.

I.3.7.6.2. Run example #
   python pg_anon.py view-data \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=source_db \
                     --prepared-sens-dict-file=sens_dict.py \
                     --schema-name=public \
                     --table-name=users \
                     --limit=10 \
                     --offset=0
I.3.7.6.3. Options #

Common pg_anon options

Option Required Description
--config No Path to the config file that can specify pg_dump and pg_restore utilities. (default: none)
--processes No Number of processes used for multiprocessing operations. (default: 4)
--db-connections-per-process No Number of database connections per process for I/O operations. (default: 4)
--verbose No Sets the log verbosity level: info, debug, error. (default: info)
--debug No Enables debug mode (equivalent to --verbose=debug) and adds extra debug logs. (default: false)

Database configuration options

Option Required Description
--db-host Yes Database host.
--db-port No Database port.
--db-name Yes Database name.
--db-user Yes Database user.
--db-user-password No Database user password.
--db-passfile No Path to a file containing the password used for authentication.
--db-ssl-key-file No Path to the client SSL key file for secure connections.
--db-ssl-cert-file No Path to the client SSL certificate file.
--db-ssl-ca-file No Path to the CA certificate used to verify the server’s certificate.

View-data mode options

Option Required Description
--prepared-sens-dict-file Yes Path to Input file or file list contains sensitive dictionary, which was generated by the create-dict (scan) mode or created manually. In rules collision case, priority has rules in last file from the list.
--schema-name Yes Schema name.
--table-name Yes Table name.
--limit No Number of rows to display. (default: 100)
--offset No Row offset for pagination. (default: 0)
--json No Outputs results in JSON format instead of a table.

I.3.8. Dictionary Schemas #

I.3.8.1. Meta Dictionary #

The meta-dictionary defines the rules used by pg_anon to detect sensitive fields during the create-dict (scan) mode

This dictionary is not generated automatically — it must be created manually. It provides powerful configuration options that let you fine-tune scanning behavior.

During scanning, pg_anon analyzes all database schemas, tables, and fields.

The meta-dictionary allows you to:

  • include or exclude specific objects

  • detect sensitive fields by:

    • field names

    • data patterns (regex)

    • exact or partial constants

    • custom Python functions

    • SQL filtering conditions

    • select which PostgreSQL types should be scanned

    • specify anonymization functions per data type

To make it easier to navigate, each section of the meta-dictionary is described separately below, with purpose, schema, and example.

At the end, a complete combined schema is shown.

I.3.8.1.1. Scan priority #
Priority Meta-dictionary section Required Description
1 skip_rules + include_rules No Narrow the set of fields to be scanned
2 Fields from sensitive dictionary + field + fields from non-sensitive Dictionary No Match by field names. Does not inspect data inside tables
3 sens_pg_types No Match by field type. Types not included in this section are ignored
4 data_sql_condition No Narrow the dataset to be scanned for specific tables using SQL conditions
5 data_func No Match data by custom data-scanning functions
6 data_const No Match data by constant values
7 data_regex No Match data by regular expressions
8 funcs No Assign anonymization functions according to detected data type

Figure I.2. scan_workflow.png

scan_workflow.png

I.3.8.1.2. Section: skip_rules #

Purpose

Defines what parts of the database should be excluded from scanning. You can skip entire schemas, specific tables, or individual fields. Useful for large schemas without sensitive data or for reducing processing time.

Schema

{
    "skip_rules": [
        {
            "schema": "<schema_name: string>",             # Skip this schema from scanning
            "schema_mask": "<schema_regex_mask: string>",  # Or skip from scanning schemas matching regex pattern
            "table": "<table_name: string>",               # Skip from scanning only this table
            "table_mask": "<table_regex_mask: string>",    # Or skip from scanning tables matching regex pattern
            "fields": [  # Skip from scanning fields of tables
                "<field_name: string>"
            ]
        }
    ]
}

Rule Combinations

  1. schema or schema_mask — required

    • You must specify one of schema or schema_mask

    • This defines where the rule applies.

  2. table or table_mask — optional

    • You may (but don’t have to) narrow the rule to specific tables.

    • If table or table_mask is specified → rule applies only to those tables.

    • If both are omitted → the rule applies to the entire schema.

  3. fields — optional

    • You may also specify particular fields to skip.

    • If fields is specified → only these fields are skipped.

    • If fields is omitted → the entire table is skipped.

    • And if no table is specified → the entire schema is skipped.

Using this section

Example meta-dictionary

{
    "skip_rules": [
        {
          "schema_msk": "^tmp.*"
        },
      {
        "schema": "ecommerce",
        "table_mask": "^client_.*"
      },
      {
        "schema": "public",
        "fields": ["currency", "id", "employee_id"]
      },
      {
        "schema": "ecommerce",
        "table": "orders",
        "fields": ["count"]
      }
    ]
}

Example Tables Structure with following dictionary matches

Schema Table Field Is skipped
public employees id Yes
public employees full_name No
public employees email No
public employees password No
public employees phone No
public employees hire_date No
public salaries employee_id No
public salaries monthly_salary No
public salaries currency Yes
ecommerce orders product_id No
ecommerce orders client_name Yes
ecommerce orders client_phone Yes
ecommerce orders client_delivery_address Yes
ecommerce orders count Yes
ecommerce orders created No
ecommerce orders status No
tmp_some_1 tbl_1 content Yes
tmp_some_2 tbl_2 content Yes
I.3.8.1.3. Section: include_rules #

Purpose

Restrict scanning to specific schemas/tables/fields. Useful when:

  • scanning only part of a large database

  • debugging detection rules

  • running focused scans

Schema

{
    "include_rules": [
        {
            "schema": "<schema_name: string>",             # Include only this schema
            "schema_mask": "<schema_regex_mask: string>",  # Or include schemas matching regex pattern
            "table": "<table_name: string>",               # Include only this table
            "table_mask": "<table_regex_mask: string>",    # Or include tables matching regex pattern
            "fields": [  # Include only this fields of tables for scanning
                "<field_name: string>"
            ]
        }
    ]
}

Combination rules

  1. schema or schema_mask — required

    • You must specify one of schema or schema_mask

    • This defines where the rule applies.

  2. table or table_mask — optional

    • You may (but don’t have to) narrow the rule to specific tables.

    • If table or table_mask is specified → rule applies only to those tables.

    • If both are omitted → the rule applies to the entire schema.

  3. fields — optional

    • You may also specify particular fields to skip.

    • If fields is specified → only these fields are skipped.

    • If fields is omitted → the entire table is skipped.

    • And if no table is specified → the entire schema is skipped.

Using this section

Example meta-dictionary

{
    "include_rules": [
        {
          "schema_msk": "^tmp.*"
        },
      {
        "schema": "ecommerce",
        "table_mask": "^client_.*"
      },
      {
        "schema": "public",
        "fields": ["currency", "id", "employee_id"]
      },
      {
        "schema": "ecommerce",
        "table": "orders",
        "fields": ["count"]
      }
    ]
}

Example Tables Structure with following dictionary matches

Schema Table Field Is skipped
public employees id No
public employees full_name Yes
public employees email Yes
public employees password Yes
public employees phone Yes
public employees hire_date Yes
public salaries employee_id Yes
public salaries monthly_salary Yes
public salaries currency No
ecommerce orders product_id Yes
ecommerce orders client_name No
ecommerce orders client_phone No
ecommerce orders client_delivery_address No
ecommerce orders count No
ecommerce orders created Yes
ecommerce orders status Yes
tmp_some_1 tbl_1 content No
tmp_some_2 tbl_2 content No
I.3.8.1.4. Section: field #

Purpose

Detect sensitive fields based solely on field name, without scanning the data inside.

Useful when already known fields contains sensitive data (e.g., email, password)

Schema

{
    "field": {
        "rules": [
            "<field_name_regex: string>",
        ],
        "constants": [
            "<field_name: string>",
        ]
    }
}
Key Meaning
rules List of regex patterns. Field names matching these patterns are always considered sensitive.
constants Exact field names to be treated as sensitive.

Using this section

Example meta-dictionary

{
    "field": {
        "rules": [
            "^client_",
            ".*phone.*"
        ],
        "constants": [
            "password",
            "email"
        ]
    }
}

Example Tables Structure with following dictionary matches

Schema Table Field Is sensitive Rule
public employees id No -
public employees full_name No -
public employees email Yes email
public employees password Yes password
public employees phone Yes .*phone.*
public employees hire_date No -
public salaries employee_id No -
public salaries monthly_salary No -
public salaries currency No -
ecommerce orders product_id No -
ecommerce orders client_name Yes ^client_
ecommerce orders client_phone Yes .*phone.*
ecommerce orders client_delivery_address Yes ^client_
ecommerce orders count No -
ecommerce orders created No -
ecommerce orders status No -
I.3.8.1.5. Section: sens_pg_types #

Purpose

Define which PostgreSQL types are scanned. Field types not included in this list are not scanned.

If omitted or empty → default types are used: text, character, varchar, mvarchar, json, integer, bigint

Schema

{
    "sens_pg_types": [
        "<field_type: string>",
    ]
}

Using this section

Example meta-dictionary

{
    "sens_pg_types": [
        "text",
        "integer",
        "bigint",
        "numeric(10,2)",
        "varchar",
        "mchar",
        "json"
    ]
}

Field Type Resolution Priorities:

  1. Exact type match (varchar(20), numeric(10,2))

  2. Base type match (varchar, numeric)

  3. Fields with another types are not scanned.

I.3.8.1.6. Section: data_sql_condition #

Purpose

Specify custom SQL WHERE conditions to sample the data instead of scanning the whole table.

Schema

{
    "data_sql_condition": [
        {
            "schema": "<schema_name: string>",             # Check only this schema
            "schema_mask": "<schema_regex_mask: string>",  # Or check schemas matching regex pattern
            "table": "<table_name: string>",               # Check only this table
            "table_mask": "<table_regex_mask: string>",    # Or check tables matching regex pattern
            "sql_condition": # Condition in raw SQL format for filtering the data to scan
                """
                <raw_SQL_WHERE_condition: string>
                """
        }
    ]
}

Rules

  1. schema or schema_mask — required

    • You must specify one of schema or schema_mask

  2. table or table_mask — optional

    • You must specify one of table or table_mask

  3. sql_condition — required

    • Must specify SQL condition for WHERE section

    • Keyword WHERE into sql_condition is not required

Using this section

Example meta-dictionary

{
    "data_sql_condition": [
        {
            "schema": "public",
            "table": "salaries",
            "sql_condition": """
                WHERE hire_date >= '2024-01-01' and hire_date <= '2024-02-01'
            """
        }
    ]
}

Result

For data scan of table public.salaries will be used only data by January 2024.

I.3.8.1.7. Section: data_func #

Purpose

Using custom SQL functions to detect sensitive fields and apply the appropriate anonymization function.

Schema

{
    "data_func": {
        "<field_type: string>": [
            {
                "scan_func_per_field": "<scan_function_for_whole_field: string>",  
                "anon_func": "<anonymization_rule_template_for_field: string>", 
            },   
            {
                "scan_func": "<scan_function_for_field_per_row: string>",
                "anon_func": "<anonymization_rule_template_for_field: string>",
                "n_count": "<how_many_checks_must_be_passed: integer>",
            },
        ],
    }
}
KeyMeaning
field_typePostgreSQL type (or custom type). "anyelement" applies to all types.
scan_func_per_fieldPython function called once for whole field. Must return Boolean.
scan_funcPython function called for each field value. Must return Boolean.
anon_funcTemplate anonymization rule. Must contain %s placeholder for the field name.
n_countThe field is considered sensitive if the scan function returned True at least n times for field values. Uses only for scan_func. (default: 1)
I.3.8.1.8. Rule Combinations #
  1. scan_func or scan_func_per_field — required

    • You must specify one of scan_func or scan_func_per_field

    • This defines where the rule applies.

Note

Functions for scan_func_per_field must follow this template:

CREATE OR REPLACE FUNCTION <schema>.<function_name>(
  schema_name TEXT,
  table_name TEXT,
  field_name TEXT
  field_type TEXT,
)
RETURNS boolean AS $$
BEGIN
  <function_logic>;
END;
$$ LANGUAGE plpgsql;

Functions for scan_func must follow this template:

CREATE OR REPLACE FUNCTION <schema>.<function_name>(
  value TEXT,
  schema_name TEXT,
  table_name TEXT,
  field_name TEXT
)
RETURNS boolean AS $$
BEGIN
  <function_logic>;
END;
$$ LANGUAGE plpgsql;

Using this section

Example meta-dictionary

{
    "data_func": {
        "anyelement": [
            {
                "scan_func_per_field": "custom_funcs.fulltext_search_by_organizations",
                "anon_func": "anon_funcs.digest(\"%s\", 'salt', 'md5')",
            },
            {
                "scan_func": "custom_funcs.is_employee_email",
                "anon_func": "anon_funcs.digest(\"%s\", 'salt', 'md5')",
                "n_count": 5
            }
        ]
    }
}

Field Type Resolution Priorities:

  1. Exact type match (varchar(20), numeric(10,2))

  2. Base type match (varchar, numeric)

  3. anyelement

Example Tables Structure with following dictionary matches

Schema Table Field Is sensitive Rule
public employees id No -
public employees full_name No -
public employees email Yes custom_funcs.is_employee_email
public employees password No -
public employees phone No -
public employees hire_date No -
public salaries employee_id No -
public salaries monthly_salary No -
public salaries currency No -
ecommerce orders product_id No -
ecommerce orders client_name No -
ecommerce orders client_phone No -
ecommerce orders client_delivery_address No -
ecommerce orders count No -
ecommerce orders created No -
ecommerce orders status No -
I.3.8.1.9. Section: data_const #

Purpose

Detect sensitive fields by matching full or partial constants.

Schema

{
    "data_const": {
        "constants": [
            "<field_value_full: string>",
        ],
        "partial_constants": [
            "<field_value_partial: string>",
        ]
    }
}
Key Meaning
constants No Matching full field value.
partial_constants No Matching substring anywhere in the field value.

Note

Must be specified one of constants or partial_constants

Using this section

Example meta-dictionary

{
    "data_const": {
        "constants": [
            "done",
        ],
        "partial_constants": [
            "@example.com",
        ]
    }
}

Example Tables Structure with following dictionary matches

Schema Table Field Is sensitive Rule
public employees id No -
public employees full_name No -
public employees email Yes @example.com
public employees password No -
public employees phone No -
public employees hire_date No -
public salaries employee_id No -
public salaries monthly_salary No -
public salaries currency No -
ecommerce orders product_id No -
ecommerce orders client_name No -
ecommerce orders client_phone No -
ecommerce orders client_delivery_address No -
ecommerce orders count No -
ecommerce orders created No -
ecommerce orders status Yes done
I.3.8.1.10. Section: data_regex #

Purpose

Detect sensitive data by scanning field values using regular expressions.

Schema

{
    "data_regex": {
        "rules": [
            "<regex_rule: string>",
        ]
    }
}

Using this section

Example meta-dictionary

{
    "data_regex": {
        "rules": [
            """[A-Za-z0-9]+([._-][A-Za-z0-9]+)*@[A-Za-z0-9-]+(\.[A-Za-z]{2,})+""",  # email
            "^(7?\d{10})$",             # phone 7XXXXXXXXXX
        ]
    }
}

Example Tables Structure with following dictionary matches

Schema Table Field Is sensitive Rule
public employees id No -
public employees full_name No -
public employees email Yes [A-Za-z0-9]+([._-][A-Za-z0-9]+)*@[A-Za-z0-9-]+(.[A-Za-z]{2,})+
public employees password No -
public employees phone Yes ^(7?)$
public employees hire_date No -
public salaries employee_id No -
public salaries monthly_salary No -
public salaries currency No -
ecommerce orders product_id No -
ecommerce orders client_name No -
ecommerce orders client_phone Yes ^(7?)$
ecommerce orders client_delivery_address No -
ecommerce orders count No -
ecommerce orders created No -
ecommerce orders status No -
I.3.8.1.11. Section: funcs #

Purpose

Configure anonymization functions per PostgreSQL type.

Schema

{
    "funcs": {
        "<field_type: string>": "<anonymization_function_for_field_type: string>", 
        "default": "<universal_anonymization_function_for_all_field_types: string>"
    }
}

Using this section

Example meta-dictionary

{
    "funcs": {
        "text": "anon_funcs.digest(\"%s\", 'salt_word', 'md5')",
        "numeric(10,2)": "anon_funcs.noise(\"%s\", 10)",
        "numeric": "anon_funcs.noise(\"%s\", 30)",
        "varchar(10)": "anon_funcs.random_string(10)",
        "varchar(20)": "anon_funcs.random_string(20)",
        "varchar": "anon_funcs.digest(\"%s\", 'varchar_salt_word', 'sha256')",
        "timestamp": "anon_funcs.dnoise(\"%s\",  interval '6 month')",
        "default": "anon_funcs.digest(\"%s\", 'MySecretSaltWord', 'sha256')"
    }
}

Field Type Resolution Priorities:

  1. Exact type match (varchar(20), numeric(10,2))

  2. Base type match (varchar, numeric)

  3. Function from default

  4. Built-in fallback function: anon_funcs.digest("%s", 'salt_word', 'md5')

I.3.8.1.12. General meta-dict schema #
{
    "skip_rules": [
        {
            "schema": "<schema_name: string>",             # Skip this schema from scanning
            "schema_mask": "<schema_regex_mask: string>",  # Or skip from scanning schemas matching regex pattern
            "table": "<table_name: string>",               # Skip from scanning only this table
            "table_mask": "<table_regex_mask: string>",    # Or skip from scanning tables matching regex pattern
            "fields": [  # Skip from scanning fields of tables
                "<field_name: string>"
            ]
        }
    ],
    "include_rules": [
        {
            "schema": "<schema_name: string>",             # Include only this schema
            "schema_mask": "<schema_regex_mask: string>",  # Or include schemas matching regex pattern
            "table": "<table_name: string>",               # Include only this table
            "table_mask": "<table_regex_mask: string>",    # Or include tables matching regex pattern
            "fields": [  # Include only this fields of tables for scanning
                "<field_name: string>"
            ]
        }
    ],
    "field": {
        "rules": [
            "<field_name_regex: string>",
        ],
        "constants": [
            "<field_name: string>",
        ]
    },
    "sens_pg_types": [
        "<field_type: string>",
    ],
    "data_sql_condition": [
        {
            "schema": "<schema_name: string>",             # Check only this schema
            "schema_mask": "<schema_regex_mask: string>",  # Or check schemas matching regex pattern
            "table": "<table_name: string>",               # Check only this table
            "table_mask": "<table_regex_mask: string>",    # Or check tables matching regex pattern
            "sql_condition": # Condition in raw SQL format for filtering the data to scan
                """
                <raw_SQL_WHERE_condition: string>
                """
        }
    ],
    "data_func": {  
        "<field_type: string>": [ 
            {
                "scan_func": "<scan_function_for_field: string>",  
                "anon_func": "<anonymization_rule_template_for_field: string>", 
                "n_count": "<how_many_checks_must_be_passed: integer>", 
            },
        ],
    },
    "data_const": {
        "constants": [
            "<field_value_full: string>",
        ],
        "partial_constants": [
            "<field_value_partial: string>",
        ]
    },
    "data_regex": {
        "rules": [
            "<regex_rule: string>",
        ]
    },
    "funcs": {
        "<field_type: string>": "<anonymization_function_for_field_type: string>", 
        "default": "<universal_anonymization_function_for_all_field_types: string>"
    }
}

I.3.8.2. Sensitive Dictionary #

The sensitive dictionary defines explicit anonymization rules for fields. It is used in four operation modes, and its behavior differs slightly across them:

  1. Dump mode

    Fields listed in the dictionary are anonymized using the defined rules. All other fields are dumped as-is.

  2. Create-dict (scan) mode

    Fields listed in the sensitive dictionary are treated as known sensitive fields, which skips sensitivity detection for them. This speeds up scanning process.

  3. View fields mode

    Shows which anonymization rules would be applied to fields.

  4. View data mode

    Shows how the rules would affect sample data, without performing a dump.

This dictionary can be created manually or generated automatically using create-dict (scan) mode.

Note

If a field appears both in the sensitive dictionary and the non-sensitive dictionary, the sensitive dictionary takes priority.

I.3.8.2.1. Schema #
{    
    "dictionary": [
        {
            "schema": "<schema_name: string>",
            "table": "<table_name: string>",
            "fields": {
                "<field_name: string>": "<anonymization_rule_for_field: string>",
            },
            "sql_condition": # Optional. Condition in raw SQL format for filtering the data to dump. (This section ignored for create-dict (scan) mode
                """
                <raw_SQL_WHERE_condition: string>
                """
        }
    ],
    # Optional section. It is used to exclude schemas and tables from the data dump.  
    "dictionary_exclude": [
        {
            "schema": "<schema_name: string>",             # Exclude only this schema
            "schema_mask": "<schema_regex_mask: string>",  # Or exclude schemas matching regex pattern
            "table": "<table_name: string>",               # Exclude only this table
            "table_mask": "<table_regex_mask: string>",    # Or exclude tables matching regex pattern
        }
    ]
}

Note

sql_condition in dictionary section is optional. It can be used for taking a part of data. Example: getting table data only by last week.

  • dictionary_exclude is optional section. If a table appears in both the dictionary_exclude and dictionary sections, then table will be dumped. It can be used for particular dump and debugging of anonymization process.

  • In dictionary_exclude, you must use either schema or schema_mask → not both.

  • In dictionary_exclude, you must use either table or table_mask → not both.

I.3.8.2.2. Using the Dictionary #

Example Database Structure

Schema Table Field
public employees id
public employees full_name
public employees email
public employees hire_date
public salaries employee_id
public salaries monthly_salary
public salaries currency
ecommerce orders product_id
ecommerce orders count
ecommerce orders client_name
ecommerce orders delivery_address
ecommerce orders created
ecommerce orders status
tenant_a projects title
tenant_a projects description
tenant_b projects title
tenant_b projects description
tenant_c projects title
tenant_c projects description

Example Sensitive Dictionary

{    
    "dictionary": [
        {
            "schema": "public",
            "table": "employees",
            "fields": {
                "full_name": "anon_funcs.digest(\"full_name\", 'salt_word', 'sha256')",  # hashing employees names 
                "email": "md5(\"email\") || @abc.com",  # hashing employee emails while preserving email format
            },
        },
        {
            "schema": "public",
            "table": "salaries",
            "fields": {
                "monthly_salary": "10000",  # just defines one value for the field for all rows
            },
        },
        {
            "schema": "ecommerce",
            "table": "orders",
            "fields": {
                "client_name": "anon_funcs.digest(\"client_name\", 'salt_word', 'sha256')",
                "delivery_address": "anon_funcs.digest(\"delivery_address\", 'salt_word', 'sha256')",
            },
            "sql_condition":  # Dumping only the orders completed within the last week
                """
                WHERE created > NOW() - '7 days'::interval
                AND status = 'done'
                """
        }
    ],
    # Excluding all tables from schemas `tenant_a`, `tenant_b`, `tenant_c` 
    "dictionary_exclude": [
        {
            "schema_mask": "tenant_.*",
            "table_mask": "*",
        }
    ]
}

This dictionary matches the following table fields:

Schema Table Field Used in dump mode Used in create-dict (scan) mode
public employees id Dumped as is Fields scanned using meta-dictionary rules
public employees full_name Dumped with anonymization Excluded from sensitivity checks as a sensitive field
public employees email Dumped with anonymization Excluded from sensitivity checks as a sensitive field
public employees hire_date Dumped as is Fields scanned using meta-dictionary rules
public salaries employee_id Dumped as is Fields scanned using meta-dictionary rules
public salaries monthly_salary Dumped with anonymization Excluded from sensitivity checks as a sensitive field
public salaries currency Dumped as is Fields scanned using meta-dictionary rules
ecommerce orders product_id Dumped as is Fields scanned using meta-dictionary rules
ecommerce orders client_name Dumped with anonymization Excluded from sensitivity checks as a sensitive field
ecommerce orders delivery_address Dumped with anonymization Excluded from sensitivity checks as a sensitive field
ecommerce orders count Dumped as is Fields scanned using meta-dictionary rules
ecommerce orders created Dumped as is Fields scanned using meta-dictionary rules
ecommerce orders status Dumped as is Fields scanned using meta-dictionary rules

I.3.8.3. Non-Sensitive Dictionary #

The non-sensitive dictionary is used only during the create-dict (scan) mode to speed up processing. It defines which fields should be treated as non-sensitive. Fields listed here are excluded from all sensitivity checks according to meta-dictionary rules.

This dictionary can be created manually or generated automatically using create-dict (scan) mode with --output-no-sens-dict-file option.

Note

If a field appears both in the sensitive dictionary and the non-sensitive dictionary, the sensitive dictionary takes priority.

I.3.8.3.1. Schema #
{    
    "no_sens_dictionary": [
        {
            "schema": "<schema_name: string>",
            "table": "<table_name: string>",
            "fields": [
                "<field_name: string>",
            ]
        },
    ]
}
I.3.8.3.2. Using the Dictionary #

Example Tables Structure

Schema Table Field
public employees id
public employees full_name
public employees email
public employees hire_date
public salaries employee_id
public salaries monthly_salary
public salaries currency

Example Non-Sensitive Dictionary

{
    "no_sens_dictionary": [
        {
            "schema": "public",
            "table": "employees",
            "fields": [
                "id",
                "hire_date",
            ]
        },
        {
            "schema": "public",
            "table": "salaries",
            "fields": [
                "employee_id",
                "currency",
            ]
        },
    ]
}

This dictionary matches the following table fields:

Schema Table Field Used in create-dict (scan) mode
public employees id Excluded from sensitivity checks as a no sensitive field
public employees full_name Fields scanned using meta-dictionary rules
public employees email Fields scanned using meta-dictionary rules
public employees hire_date Excluded from sensitivity checks as a no sensitive field
public salaries employee_id Excluded from sensitivity checks as a no sensitive field
public salaries monthly_salary Fields scanned using meta-dictionary rules
public salaries currency Excluded from sensitivity checks as a no sensitive field

I.3.8.4. Tables dictionary #

The tables dictionary defines which tables participate in the partial dump and partial restore operations. It can act as either a whitelist (include-only) or a blacklist (exclude-only).

Use this dictionary when you need to:

  • dump or restore only specific tables

  • exclude unwanted tables from the dump or restore

I.3.8.4.1. Schema #
{
    "tables": [
        {
            "schema": "<schema_name: string>",             # Include only this schema
            "schema_mask": "<schema_regex_mask: string>",  # Or include schemas matching regex pattern
            "table": "<table_name: string>",               # Include only this table
            "table_mask": "<table_regex_mask: string>",    # Or include tables matching regex pattern
        }
    ]
}

Note

  • You must use either schema or schema_mask → not both.

  • You must use either table or table_mask → not both.

I.3.8.4.2. Using the Dictionary #

You can use the same dictionary in two different roles:

  • Whitelist — dump/restore only the matched tables

  • Blacklist — dump/restore all tables except the matched ones

Example Database Structure

Schema Table
public employees
public departments
public positions
public salaries
public users
ecommerce products
ecommerce categories
ecommerce orders
ecommerce order_items
tenant_a users
tenant_a projects
tenant_a tasks
tenant_a comments
tenant_b users
tenant_b projects
tenant_b tasks
tenant_b comments
tenant_c users
tenant_c projects
tenant_c tasks
tenant_c comments

Example Tables Dictionary

{
    "tables": [  
        {
            "schema": "public",
            "table": "employees"
        },
        {
            "schema": "ecommerce",
            "table_mask": "^orders"
        },
        {
            "schema_mask": "_a$",
            "table": "projects"
        },
        {
            "schema_mask": "*",
            "table_mask": "users"
        },
    ]
}

This dictionary matches the following tables:

Schema Table Matched by rule
ecommerce orders schema="ecommerce", table_mask="^orders"
ecommerce order_items schema="ecommerce", table_mask="^orders"
tenant_a projects schema_mask="_a$", table="projects"
tenant_a users schema_mask="*", table_mask="users"
tenant_b users schema_mask="*", table_mask="users"
tenant_c users schema_mask="*", table_mask="users"
public users schema_mask="*", table_mask="users"
public employees schema="public", table="employees"

I.3.9. Debug stages for anonymization process #

The debug stages allow you to test and troubleshoot the anonymization workflow without performing a full dump or restore, saving significant time and resources.

Each stage emulates a specific part of the anonymization pipeline:

  • Stage 1 — Validate Dict

    Validates the sensitive dictionary and checks SQL logic without exporting any data.

  • Stage 2 — Validate Data

    Performs anonymization checks on real data with a limited sample (LIMIT 100) using a prepared database schema.

  • Stage 3 — Validate Full

    Executes the full anonymization logic with data sampling (LIMIT 100), but without requiring a prepared database.

These stages help you quickly debug rules, anonymization functions, SQL conditions, and dictionary configuration before running a full anonymized dump/restore process.

I.3.9.1. Stage 1: Validate dict #

This stage validate dictionary, show the tables and run SQL queries without data export into the disk or database. So if program works without errors => the stage is passed.

Figure I.3. dbg-stage-1.png

dbg-stage-1.png

   python pg_anon.py dump \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=test_source_db \
                     --output-dir=test_dbg_stages \
                     --prepared-sens-dict-file=test_dbg_stages.py \
                     --clear-output-dir \
                     --verbose=debug \
                     --debug \
                     --dbg-stage-1-validate-dict

I.3.9.2. Stage 2: Validate data #

Validate data, show the tables and run SQL queries with data export and limit 100 in prepared database. This stage requires database with all structure with only pre-data condition, which described in –prepared-sens-dict-file.

  • If you want to create the database with required structure, just run:

One-time structure dump:

   python pg_anon.py sync-struct-dump \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=test_source_db \
                     --output-dir=test_stage_2 \
                     --prepared-sens-dict-file=test_dbg_stages.py \
                     --clear-output-dir \
                     --verbose=debug \
                     --debug \
                     --dbg-stage-3-validate-full

And then as many times as you want structure restore:

   su - postgres -c "psql -U postgres -d postgres -c \"DROP DATABASE IF EXISTS test_target_db_7\""
   su - postgres -c "psql -U postgres -d postgres -c \"CREATE DATABASE test_target_db_7\""
   python pg_anon.py sync-struct-restore \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=test_target_db_7 \
                     --input-dir=test_stage_2 \
                     --verbose=debug \
                     --debug 
  • Validate data stage in dump:

Figure I.4. dbg-stage-2.png

dbg-stage-2.png

   python pg_anon.py dump \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=test_source_db \
                     --output-dir=test_dbg_stages \
                     --prepared-sens-dict-file=test_dbg_stages.py \
                     --clear-output-dir \
                     --verbose=debug \
                     --debug \
                     --dbg-stage-2-validate-data
  • Validate data stage in data-restore:

   python pg_anon.py sync-data-restore \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=test_target_db_7 \
                     --input-dir=test_dbg_stages \
                     --verbose=debug \
                     --debug 
   
   # And for example view all data in every table:
   su - postgres -c "psql -U postgres -d test_target_db_7 -c \"SELECT * FROM public.contracts\""

I.3.9.3. Stage 3: Validate full #

Figure I.5. dbg-stage-3.png

dbg-stage-3.png

Makes all logic with limit 100 in SQL queries. In this stage you don’t need prepared database, just run:

   su - postgres -c "psql -U postgres -d postgres -c \"DROP DATABASE IF EXISTS test_target_db_8\""
   su - postgres -c "psql -U postgres -d postgres -c \"CREATE DATABASE test_target_db_8\""
  • Validate full stage in dump:

   python pg_anon.py dump \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=test_source_db \
                     --output-dir=test_dbg_stages \
                     --prepared-sens-dict-file=test_dbg_stages.py \
                     --clear-output-dir \
                     --verbose=debug \
                     --debug \
                     --dbg-stage-3-validate-full
  • Validate full stage in restore:

   python pg_anon.py restore \
                     --db-host=127.0.0.1 \
                     --db-user=postgres \
                     --db-user-password=postgres \
                     --db-name=test_target_db_8 \
                     --input-dir=test_dbg_stages \
                     --verbose=debug \
                     --debug 
   
   # And for example view all data in every table:
   su - postgres -c "psql -U postgres -d test_target_db_8 -c \"SELECT * FROM public.contracts\""

I.3.10. FAQ #

I.3.10.1. Where can I find operation logs and launch parameters? #

All run data is stored in the /path_to_pg_anon/runs directory. Inside, the structure is: <year>/<month>/<day>/<operation_id>.

Each operation folder contains:

  • a logs directory with all log files

  • a run_options.json file with all parameters used to run pg_anon

If the --save-dicts option was used, the folders input and output will also appear. They contain all input and output dictionaries for that run.

I.3.10.2. Can I restore a pg_anon dump using pg_dump? #

No. The pg_anon dump format is not compatible with pg_dump due to the specifics of anonymization.

For the same reason, a regular backup created with pg_dump cannot be restored using pg_anon.

I.3.10.3. Does pg_anon modify the structure or data of the source database during scan, dump, view-data, or view-fields? #

pg_anon does not modify either the structure or the data of the source database.

The only thing pg_anon adds is the anon_funcs schema, which is required for its internal operations.

I.3.10.4. Can I use custom functions for scanning? #

Yes. The meta-dictionary has a data_func section. In this section, you can use any custom SQL function for sensitivity validation.

This allows you to implement checks using full-text search or any other SQL capabilities.

Such functions must follow this template:

CREATE OR REPLACE FUNCTION <schema>.<function_name>(
  value TEXT,
  schema_name TEXT,
  table_name TEXT,
  field_name TEXT
)
RETURNS boolean AS $$
BEGIN
  <function_logic>;
END;
$$ LANGUAGE plpgsql;

I.3.10.5. Can I use custom functions for anonymization? #

Yes. You can use any functions and values available in the source database.

You must ensure that anonymized values match the field format. For example, if the field type is varchar(15), you must manually ensure the generated value does not exceed 15 characters.

If the format is violated, the dump may be created successfully, but restoring it may fail.

Also for this cases can be used data_func section with scan_func for field length comparison and specific anon_function for specific length.

For example, scan function bellow getting only fields with length less than 20 symbols and containing emails:

CREATE OR REPLACE FUNCTION my_scan_funcs.is_email_field_with_len_20_chars(
  value TEXT,
  schema_name TEXT,
  table_name TEXT,
  field_name TEXT
)
RETURNS boolean AS $$
DECLARE
    max_len integer;
    is_email boolean;
BEGIN
    SELECT c.character_maximum_length
    INTO max_len
    FROM information_schema.columns c
    WHERE c.table_schema = $2
      AND c.table_name = $3
      AND c.column_name = $4;

    -- field length must be 20 characters
    if max_len != 20 then
        return false;
    end if;  	
   
   -- value must be not null for comparison
    if $1 is null then
    	return false;
    end if;  	
   
    -- check email format by regexp
    return $1 ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';
END;
$$ LANGUAGE plpgsql;

The meta-dict rule below can be used to detect email fields with a length of 20 characters and anonymize them while preserving both format and length.

{
    "data_func": {
        "varchar": [
            {
                "scan_func": "my_scan_funcs.is_email_field_with_len_20_chars",
                "anon_func": "lower(anon_funcs.random_string(9)) || '@secret.com'",
                "n_count": 10
            }
        ]
    }
}

I.3.10.6. Is the scanning stage required? #

No. You can create all required dictionaries manually or reuse previously generated dictionaries.

I.3.10.7. Why load sensitive and non-sensitive dictionaries during scanning? #

They are used only to speed up scanning.

These dictionaries act as a cache, allowing pg_anon to immediately know which fields are sensitive and which are not.

This way, repeated scans of the same database will run very quickly.

If new fields appear that are not present in the dictionaries, pg_anon will evaluate them using the rules from the meta-dictionary.

I.3.10.8. When should I use --config with a configuration file? #

If you plan to use pg_anon with different Tantor SE major versions, you should define a config file.

It is much easier to configure this once rather than repeatedly passing paths to pg_dump and pg_restore.

If you always use a single Tantor SE version, the system pg_dump and pg_restore will be used, and a config file is unnecessary.

I.3.10.9. Can I split one large dictionary into multiple smaller ones? #

Yes. All dictionary-related parameters accept lists of files.

At startup, pg_anon merges them into a single dictionary internally.

This makes it easy to separate different groups of rules into different files and combine them as needed. This is especially helpful for the meta-dictionary, which contains many optional sections.

I.3.10.10. Restore error: Database is not empty #

Restore mode checks that the target database is empty.

This is done to prevent accidental data loss in the target database.

If needed, use the --drop-db or --clean-db options during restore.

I.3.10.11. Restore error: Database is being accessed by other users #

When using the --drop-db option, the target database will be recreated using DROP DATABASE and CREATE DATABASE.

If there are active connections, the DROP DATABASE command cannot be executed.

You must terminate all active sessions and run the restore operation again.

I.3.10.12. Difference between options --drop-db and --clean-db for restore mode #

  • --drop-db — recreate target database using commands DROP DATABASE and CREATE DATABASE. After that running restore process on empty db.

  • --clean-db — performs a restore similar to pg_restore –clean –if-exists. It creates missing tables from the backup in the target database. It also preserves extra tables that exist in the target DB and are not contained in the restoring backup. This option does not require an empty target database.

I.3.10.13. Determining Optimal Process and Connection Counts #

To configure optimal values, first identify these system parameters:

  • max_connections — maximum connections allowed by your Tantor SE database

  • CPU core count

  • Reserved connections (typically 3-10 for maintenance/admin connections)

Important Considerations:

  • Exceeding max_connections may cause pg_anon failures and affect other database applications

  • Ensure sufficient connection headroom for other services

I.3.10.13.2. Example Calculation #
  • CPU cores: 4

  • max_connections: 100

  • reserved_connections: 5

  • –processes: 4

  • –db-connections-per-process: (100 - 5) / 4 ≈ 23.75 → 23

  • Verification: 4 processes × 23 connections = 92 total connections (within 100 limit)

I.3.11. SQL Functions Library #

All functions are contained in the init.sql file. After run pg_anon in init mode, they will reside in the anon_funcs schema in the source database. If you want to write a new function, simply create it in the anon_funcs schema in your source database.

List of some functions available for use in dictionaries:

I.3.11.1. Functions list #

I.3.11.1.1. noise #

Add noise to a real number:

SELECT anon_funcs.noise(100, 1.2);
>> 123
I.3.11.1.2. dnoise #

Add noise to a date or timestamp:

SELECT anon_funcs.dnoise('2020-02-02 10:10:10'::timestamp, interval '1 month');
>> 2020-03-02 10:10:10
I.3.11.1.3. digest #

Hash a string value with a specified hash function:

SELECT anon_funcs.digest('text', 'salt', 'sha256');
>> '3353e....'
I.3.11.1.4. partial #

Keep the first few characters (2nd argument) and the last few characters (4th argument) of the specified string, adding a constant (3rd argument) in between:

SELECT anon_funcs.partial('123456789', 1, '***', 3);
>> 1***789
I.3.11.1.5. partial_email #

Mask an email address:

SELECT anon_funcs.partial_email('example@gmail.com');
>> ex*****@gm*****.com
I.3.11.1.6. random_string #

Generate a random string of specified length:

SELECT anon_funcs.random_string(7);
>> H3ZVL5P
I.3.11.1.7. random_zip #

Generate a random ZIP code:

SELECT anon_funcs.random_zip();
>> 851467
I.3.11.1.8. random_date_between #

Generate a random date and time within a specified range:

SELECT anon_funcs.random_date_between(
   '2020-02-02 10:10:10'::timestamp,
   '2022-02-05 10:10:10'::timestamp
);
>> 2021-11-08 06:47:48.057
I.3.11.1.9. random_date #

Generate a random date and time:

SELECT anon_funcs.random_date();
>> 1911-04-18 21:54:13.139
I.3.11.1.10. random_int_between #

Generate a random integer within a specified range:

SELECT anon_funcs.random_int_between(100, 200);
>> 159
I.3.11.1.11. random_bigint_between #

Generate a random bigint within a specified range:

SELECT anon_funcs.random_bigint_between(6000000000, 7000000000);
>> 6268278565
I.3.11.1.12. random_phone #

Generate a random phone number:

SELECT anon_funcs.random_phone('+7');
>> +7297479867
I.3.11.1.13. random_hash #

Generate a random hash using the specified function:

SELECT anon_funcs.random_hash('seed', 'sha512');
>> b972f895ebea9cf2f65e19abc151b8031926c4a332471dc5c40fab608950870d6dbddcd18c7e467563f9b527e63d4d13870e4961c0ff2a62f021827654ae51fd
I.3.11.1.14. random_in #

Select a random element from an array:

SELECT anon_funcs.random_in(array['a', 'b', 'c']);
>> a
I.3.11.1.15. hex_to_int #

Convert a hexadecimal value to decimal:

SELECT anon_funcs.hex_to_int('8AB');
>> 2219

I.3.11.2. pgcrypto #

In addition to the existing functions in the anon_funcs schema, functions from the pgcrypto extension can also be used.

CREATE EXTENSION IF NOT EXISTS pgcrypto;

Example of using encryption with base64 encoding to store the encrypted value in a text field:

SELECT encode((SELECT encrypt('data', 'password', 'bf')), 'base64');
>> cSMq9gb1vOw=

SELECT decrypt(
(
SELECT decode('cSMq9gb1vOw=', 'base64')
), 'password', 'bf');
>> data

I.3.11.3. How to add your own functions #

Also, adding new anonymization functions can be performed by adding init.sql to the file and then run pg_anon in init mode.

I.3.12. API #

I.3.12.1. Operation Endpoints #

I.3.12.1.1. Check DB connection #

POST /api/stateless/check_db_connection

Description

Checks whether pg_anon can connect to the specified database.

If the connection is successful, the endpoint returns status code 200.

Check DB connection request body schema

Field Type Required Description
host string Yes Database host.
port integer Yes Database port.
db_name string Yes Database name.
user_login string Yes Database username.
user_password string Yes Database user password.

Example

curl -X POST http://127.0.0.1:8000/api/stateless/check_db_connection \
-H "Content-Type: application/json" \
-d '{
  "host": "localhost",
  "port": "5432",
  "db_name":  "source_db",
  "user_login": "postgres",
  "user_password":  "postgres"
}'

Responses

Status Code Description Component
200 Database is reachable -
400 Bad Request ErrorResponse
500 Internal Server Error ErrorResponse
422 Validation Error HTTPValidationError
I.3.12.1.2. Run create-dict (scan) operation #

POST /api/stateless/scan

Description

Runs pg_anon in create-dict (scan) mode in the background.

Operation lifecycle:

  1. The client calls this endpoint.

  2. The API returns one of the following status codes:

    • 200 — the scan operation has been successfully started.

    • 400 or 422 — the request is invalid; the operation is not started.

  3. The service sends a webhook request with status in_progress to the webhook_status_url. The payload format is described in the scan webhook request schema

  4. The operation executes in the background.

  5. If an error occurs during processing, the service sends a webhook request with status error.

  6. If the operation completes successfully, the service sends a webhook request with status success.

Scan request body schema

Field Type Required Description
operation_id string Yes External operation ID. Accepts any string. Returned unchanged in webhook requests.
db_connection_params db connection params Yes Source database connection credentials.
webhook_status_url string Yes Callback URL that receives POST requests in the format described in the scan webhook request schema.
webhook_metadata Any No Arbitrary metadata payload. Sent unchanged in webhook requests.
webhook_extra_headers JSON No Additional HTTP headers added to webhook requests. Useful for integration, e.g., to include an Authorization header.
webhook_verify_ssl boolean No Enables or disables SSL certificate verification for webhook requests.
save_dicts boolean No Saves all input and output dictionaries into the runs directory. Useful for debugging or integration. Default: false.
type string No Defines the scan mode: full or partial. Default: partial.
depth integer No Maximum number of table rows used for partial scan. Applies only when type = partial. Default: 10000.
meta_dict_contents array of dictionary content Yes Contents of the meta dictionary, defining rules for scanning fields.
sens_dict_contents array of dictionary content No Contents of the sensitive dictionary. Used to improve scan performance.
no_sens_dict_contents array of dictionary content No Contents of the non-sensitive dictionary. Used to improve scan performance.
need_no_sens_dict boolean No If true, generates a non-sensitive dictionary and returns it in the no_sens_dict_contents field of the webhook payload.
proc_count integer No Number of processes used for multiprocessing. Default: 4.
proc_conn_count integer No Number of database connections allocated per process for I/O operations. Default: 4.

Example

curl -X POST http://127.0.0.1:8000/api/stateless/scan \
-H "Content-Type: application/json" \
-d '{
  "operation_id": "my-uniq-scan-id-0001",
  "db_connection_params": {
     "host": "localhost",
     "port": "5432",
     "db_name":  "source_db",
     "user_login": "postgres",
     "user_password":  "postgres"
  },
  "webhook_status_url": "https://my-service/pg-anon-result-processor",
  "webhook_metadata": {"extra_field_1": "extra_data", "extra_field_2": {"fld1": [1,2,3], "fld2": 123}},
  "webhook_extra_headers": {"Authorization": "Api-key my_super_secret_api_key", "X-my-service-extra-header": "header value"},
  "webhook_verify_ssl": true,
  "save_dicts": true,
  "type": "partial",
  "depth": 10000,
  "meta_dict_contents": [{
    "name": "my simple meta dict with email scan rule",
    "content": "{\"data_regex\": {\"rules\": [\".*@.*\"]}, \"funcs\": {\"text\": \"md5(%s)\"}}"
  }],
  "sens_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "no_sens_dict_contents": [{
    "name": "non-sens dict example",
    "content": "{\"no_sens_dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": [\"id\", \"created\"]}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": [\"id\", \"registered\"]}]}"
  }],
  "need_no_sens_dict": true,
  "proc_count": 4,
  "proc_conn_count": 4
}'

Scan webhook request schema

Field Type Required Description
operation_id string Yes External operation ID. Copied from the operation_id of the original scan request.
internal_operation_id string No Internal pg_anon operation ID generated automatically. Present only when status is success or error. Can be used to correlate the webhook event with operation data and logs stored in the /runs directory. Also used in the Operation endpoints.
status_id integer Yes Numeric status code. Possible values: 2 — success, 3 — error, 4 — in_progress.
status string Yes Human-readable status. Possible values: success, error, in_progress.
webhook_metadata Any No Metadata payload passed “as is”. Copied from the webhook_metadata field of the scan request.
started string No Operation start timestamp in ISO8601 format (UTC+0). Present only for statuses success or error.
ended string No Operation end timestamp in ISO8601 format (UTC+0). Present only for statuses success or error.
error string No Error message. Present only when status = error.
error_codestringNoMachine-readable error code for i18n mapping. Present only when status = error.
run_options JSON No Snapshot of the operation’s runtime options. Useful for analysis, debugging, and rerunning the operation. Present only when status = success or error.
sens_dict_content dictionary content No Resulting Sensitive dictionary. Returned only when the scan completes successfully. Used for dump operations or repeated scans.
no_sens_dict_content dictionary content No Resulting Non-Sensitive dictionary. Present only when the scan completes successfully and the original request specified need_no_sens_dict = true.

Responses

Status Code Description Component
201 Operation successfully started -
400 Bad Request ErrorResponse
500 Internal Server Error ErrorResponse
422 Validation Error HTTPValidationError
I.3.12.1.3. Display database fields with anonymization rules #

POST /api/stateless/view-fields

Description

Runs pg_anon in view-fields mode and returns the result in the response.

View-fields request body schema

Field Type Required Description
db_connection_params db connection params Yes Source database credentials.
sens_dict_contents array of dictionary content No Sensitive dictionary content that defines rules for sensitive fields.
schema_name string No Filter by schema name.
schema_mask string No Filter by schema name using a regular expression.
table_name string No Filter by table name.
table_mask string No Filter by table name using a regular expression.
view_only_sensitive_fields boolean No Displays only sensitive fields (default: all fields).
fields_limit_count integer No Maximum number of fields to include for output (default: 5000).

Example

curl -X POST http://127.0.0.1:8000/api/stateless/view-fields \
-H "Content-Type: application/json" \
-d '{
  "db_connection_params": {
     "host": "localhost",
     "port": "5432",
     "db_name":  "source_db",
     "user_login": "postgres",
     "user_password":  "postgres"
  },
  "sens_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "schema_name": "public",
  "table_mask": "^client",
  "view_only_sensitive_fields": true,
  "fields_limit_count": 1000
}'

Responses

Status Code Description Component
200 Successful Response ViewFieldsResponse
400 Bad Request ErrorResponse
500 Internal Server Error ErrorResponse
422 Validation Error HTTPValidationError
I.3.12.1.4. Display table with anonymization data #

POST /api/stateless/view-data

Description

Displays table data in original and anonymized variants for comparison. Runs pg_anon in view-data mode and returns the result in the response.

View-data request body schema

Field Type Required Description
db_connection_params db connection params Yes Source database credentials.
sens_dict_contents array of dictionary content No Sensitive dictionary content defining rules for sensitive fields.
schema_name string Yes Schema name.
table_name string Yes Table name.
limit integer No Number of rows to display (default: 100).
offset integer No Row offset for pagination (default: 0).

Example

curl -X POST http://127.0.0.1:8000/api/stateless/view-data \
-H "Content-Type: application/json" \
-d '{
  "db_connection_params": {
     "host": "localhost",
     "port": "5432",
     "db_name":  "source_db",
     "user_login": "postgres",
     "user_password":  "postgres"
  },
  "sens_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "schema_name": "public",
  "table_name": "clients",
  "limit": 10,
  "offset": 20
}'

Responses

Status Code Description Component
200 Successful Response ViewDataResponse
400 Bad Request ErrorResponse
500 Internal Server Error ErrorResponse
422 Validation Error HTTPValidationError
I.3.12.1.5. Display database schemas only for preview #
POST /api/stateless/preview

Description

Returns a list of database schemas. Can be filtered by schema name substring.

Preview schemas request body schema

FieldTypeRequiredDescription
db_connection_paramsdb connection paramsYesSource database credentials.
schema_filterstringNoFilter schemas by name substring (case-sensitive).

Example

curl -X POST http://127.0.0.1:8000/api/stateless/preview \
-H "Content-Type: application/json" \
-d '{
  "db_connection_params": {
     "host": "localhost",
     "port": "5432",
     "db_name":  "source_db",
     "user_login": "postgres",
     "user_password":  "postgres"
  },
  "schema_filter": "public"
}'

Responses

Status CodeDescriptionComponent
200Successful ResponsePreviewSchemasResponse
400Bad RequestErrorResponse
500Internal Server ErrorErrorResponse
422Validation ErrorHTTPValidationError
I.3.12.1.6. Display database tables with fields for preview #
POST /api/stateless/preview/{schema}

Description

Returns a list of tables in the specified schema with their fields and sensitivity/exclusion status based on the provided sensitive dictionary.

Preview schema tables request body schema

FieldTypeRequiredDescription
db_connection_paramsdb connection paramsYesSource database credentials.
sens_dict_contentsarray of dictionary contentYesSensitive dictionary content that defines rules for sensitive fields.
limitintegerNoMaximum number of tables to return (default: 20).
offsetintegerNoNumber of tables to skip for pagination (default: 0).
table_filterstringNoFilter tables by name substring (case-sensitive).
view_only_sensitive_tablesbooleanNoIf true, returns only tables that match the sensitive dictionary (default: false).

Path parameters

FieldTypeRequiredDescription
schemastringYesSchema name

Example

curl -X POST http://127.0.0.1:8000/api/stateless/preview/public \
-H "Content-Type: application/json" \
-d '{
  "db_connection_params": {
     "host": "localhost",
     "port": "5432",
     "db_name":  "source_db",
     "user_login": "postgres",
     "user_password":  "postgres"
  },
  "sens_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "limit": 20,
  "offset": 0,
  "table_filter": "user",
  "view_only_sensitive_tables": false
}'

Responses

Status CodeDescriptionComponent
200Successful ResponsePreviewSchemaTablesResponse
400Bad RequestErrorResponse
500Internal Server ErrorErrorResponse
422Validation ErrorHTTPValidationError
I.3.12.1.7. Run dump operation #

POST /api/stateless/dump

Description

Runs pg_anon in dump mode in the background.

Operation lifecycle:

  1. The client calls this endpoint.

  2. The API returns one of the following status codes:

    • 200 — the dump operation has been successfully started.

    • 400 or 422 — the request is invalid; the operation is not started.

  3. The service sends a webhook request with status in_progress to the webhook_status_url. The payload format is described in the dump webhook request schema.

  4. The operation executes in the background.

  5. If an error occurs during processing, the service sends a webhook request with status error.

  6. If the operation completes successfully, the service sends a webhook request with status success.

Dump request body schema

Field Type Required Description
operation_id string Yes External operation ID. Accepts any string. Returned unchanged in webhook requests.
db_connection_params db connection params Yes Source database connection credentials.
webhook_status_url string Yes Callback URL that receives POST requests in the format described in the dump webhook request schema.
webhook_metadata Any No Arbitrary metadata payload. Sent unchanged in webhook requests.
webhook_extra_headers JSON No Additional HTTP headers added to webhook requests. Useful for integration, e.g., to include an Authorization header.
webhook_verify_ssl boolean No Enables or disables SSL certificate verification for webhook requests.
save_dicts boolean No Saves all input and output dictionaries into the runs directory. Useful for debugging or integration. Default: false.
type string No Defines the dump type. Options: dump, sync-struct-dump, sync-data-dump. Default: dump.
sens_dict_contents array of dictionary content Yes Contents of the sensitive dictionary, defining rules for data anonymization during the dump.
partial_tables_dict_contents array of dictionary content No Contents of the tables dictionary specifying tables to include in a partial dump.
partial_tables_exclude_dict_contents array of dictionary content No Contents of the tables dictionary specifying tables to exclude from a partial dump.
output_path string No Path where the dump will be created under /path_to_pg_anon/output. For example, "my_dump" will be located at /path_to_pg_anon/output/my_dump.
pg_dump_path string No Path to the pg_dump Postgres tool. Default: /usr/bin/pg_dump.
pg_dump_optionsstringNoAdditional options passed directly to pg_dump utility. Example: "--no-comments --encoding=LATIN1".
ignore_privileges boolean No Ignore privileges from source db.
proc_count integer No Number of processes used for multiprocessing. Default: 4.
proc_conn_count integer No Number of database connections allocated per process for I/O operations. Default: 4.

Example

curl -X POST http://127.0.0.1:8000/api/stateless/dump \
-H "Content-Type: application/json" \
-d '{
  "operation_id": "my-uniq-scan-id-0001",
  "db_connection_params": {
     "host": "localhost",
     "port": "5432",
     "db_name":  "source_db",
     "user_login": "postgres",
     "user_password":  "postgres"
  },
  "webhook_status_url": "https://my-service/pg-anon-result-processor",
  "webhook_metadata": {"extra_field_1": "extra_data", "extra_field_2": {"fld1": [1,2,3], "fld2": 123}},
  "webhook_extra_headers": {"Authorization": "Api-key my_super_secret_api_key", "X-my-service-extra-header": "header value"},
  "webhook_verify_ssl": true,
  "save_dicts": true,
  "type": "dump",
  "sens_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "partial_tables_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "partial_tables_exclude_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "output_path": "my_dump",
  "pg_dump_path": "/usr/lib/postgresql/17/bin/pg_dump",
  "ignore_privileges": false,
  "proc_count": 4,
  "proc_conn_count": 4
}'

Dump webhook request schema

Field Type Required Description
operation_id string Yes External operation ID. Copied from the operation_id of the original scan request.
internal_operation_id string No Internal pg_anon operation ID generated automatically. Present only when status is success or error. Can be used to correlate the webhook event with operation data and logs stored in the /runs directory. Also used in the Operation endpoints.
status_id integer Yes Numeric status code. Possible values: 2 — success, 3 — error, 4 — in_progress.
status string Yes Human-readable status. Possible values: success, error, in_progress.
webhook_metadata Any No Metadata payload passed “as is”. Copied from the webhook_metadata field of the scan request.
started string No Operation start timestamp in ISO8601 format (UTC+0). Present only for statuses success or error.
ended string No Operation end timestamp in ISO8601 format (UTC+0). Present only for statuses success or error.
error string No Error message. Present only when status = error.
error_codestringNoMachine-readable error code for i18n mapping. Present only when status = error.
run_options JSON No Snapshot of the operation’s runtime options. Useful for analysis, debugging, and rerunning the operation. Present only when status = success or error.
size integer No Size of the dump in bytes.

Responses

Status Code Description Component
201 Operation successfully started -
400 Bad Request ErrorResponse
500 Internal Server Error ErrorResponse
422 Validation Error HTTPValidationError
I.3.12.1.8. Run restore operation #

POST /api/stateless/restore

Description

Runs pg_anon in restore mode in the background.

Operation lifecycle:

  1. The client calls this endpoint.

  2. The API returns one of the following status codes:

    • 200 — the restore operation has been successfully started.

    • 400 or 422 — the request is invalid; the operation is not started.

  3. The service sends a webhook request with status in_progress to the webhook_status_url. The payload format is described in the restore webhook request schema.

  4. The operation executes in the background.

  5. If an error occurs during processing, the service sends a webhook request with status error.

  6. If the operation completes successfully, the service sends a webhook request with status success.

Restore request body schema

Field Type Required Description
operation_id string Yes External operation ID. Accepts any string. Returned unchanged in webhook requests.
db_connection_params db connection params Yes Source database connection credentials.
webhook_status_url string Yes Callback URL that receives POST requests in the format described in the restore webhook request schema.
webhook_metadata Any No Arbitrary metadata payload. Sent unchanged in webhook requests.
webhook_extra_headers JSON No Additional HTTP headers added to webhook requests. Useful for integration, e.g., to include an Authorization header.
webhook_verify_ssl boolean No Enables or disables SSL certificate verification for webhook requests.
save_dicts boolean No Saves all input and output dictionaries into the runs directory. Useful for debugging or integration. Default: false.
type string No Defines the restore type. Options: restore, sync-struct-restore, sync-data-restore. Default: restore.
input_path string Yes Path to the dump to restore, relative to /path_to_pg_anon/output. Example: "my_dump" will restore from /path_to_pg_anon/output/my_dump.
partial_tables_dict_contents array of dictionary content No Contents of the tables dictionary specifying tables to include in a partial restore
partial_tables_exclude_dict_contents array of dictionary content No Contents of the tables dictionary specifying tables to exclude from a partial restore
pg_restore_path string No Path to the pg_restore Postgres tool. Default: /usr/bin/pg_restore.
pg_restore_optionsstringNoAdditional options passed directly to pg_restore utility. Example: "--no-comments --no-table-access-method".
drop_custom_check_constr boolean No Drops all CHECK constraints that contain user-defined procedures to avoid performance degradation during data loading. Default: false.
clean_db boolean No Cleans existing database objects before restoring. Mutually exclusive with drop_db.
drop_db boolean No Drops the target database before restoring. Mutually exclusive with clean_db.
ignore_privileges boolean No Ignore privileges from source db.
proc_count integer No Number of processes used for multiprocessing. Default: 4.
proc_conn_count integer No Number of database connections allocated per process for I/O operations. Default: 4.

Example

curl -X POST http://127.0.0.1:8000/api/stateless/restore \
-H "Content-Type: application/json" \
-d '{
  "operation_id": "my-uniq-scan-id-0001",
  "db_connection_params": {
     "host": "localhost",
     "port": "5432",
     "db_name":  "source_db",
     "user_login": "postgres",
     "user_password":  "postgres"
  },
  "webhook_status_url": "https://my-service/pg-anon-result-processor",
  "webhook_metadata": {"extra_field_1": "extra_data", "extra_field_2": {"fld1": [1,2,3], "fld2": 123}},
  "webhook_extra_headers": {"Authorization": "Api-key my_super_secret_api_key", "X-my-service-extra-header": "header value"},
  "webhook_verify_ssl": true,
  "save_dicts": true,
  "type": "restore",
  "input_path": "my_dump",
  "partial_tables_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "partial_tables_exclude_dict_contents": [{
    "name": "sens dict for email anonymization",
    "content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
  }],
  "pg_restore_path": "/usr/lib/postgresql/17/bin/pg_restore",
  "drop_custom_check_constr": false,
  "clean_db": false,
  "drop_db": false,
  "ignore_privileges": false,
  "proc_count": 4,
  "proc_conn_count": 4
}'

Restore webhook request schema

Field Type Required Description
operation_id string Yes External operation ID. Copied from the operation_id of the original scan request.
internal_operation_id string No Internal pg_anon operation ID generated automatically. Present only when status is success or error. Can be used to correlate the webhook event with operation data and logs stored in the /runs directory. Also used in the Operation endpoints.
status_id integer Yes Numeric status code. Possible values: 2 — success, 3 — error, 4 — in_progress.
status string Yes Human-readable status. Possible values: success, error, in_progress.
webhook_metadata Any No Metadata payload passed “as is”. Copied from the webhook_metadata field of the scan request.
started string No Operation start timestamp in ISO8601 format (UTC+0). Present only for statuses success or error.
ended string No Operation end timestamp in ISO8601 format (UTC+0). Present only for statuses success or error.
error string No Error message. Present only when status = error.
error_codestringNoMachine-readable error code for i18n mapping. Present only when status = error.
run_options JSON No Snapshot of the operation’s runtime options. Useful for analysis, debugging, and rerunning the operation. Present only when status = success or error.

Responses

Status Code Description Component
201 Operation successfully started -
400 Bad Request ErrorResponse
500 Internal Server Error ErrorResponse
422 Validation Error HTTPValidationError

I.3.12.2. Integration Endpoints #

I.3.12.2.1. Operations list #

GET /operation

Description

Returns a list of background operation directories (scan, dump, restore). Only operations executed with save_dicts enabled are included. Useful for integration purposes.

Operations list request params

Field Type Required Description
date_before date No Filter: operations before this date. Date format ISO 8601
date_after date No Filter: operations after this date. Date format ISO 8601

Example

curl -X GET http://127.0.0.1:8000/operation?date_after=2025-01-01&date_before=2025-12-31

Responses

Status Code Description Component
200 List of operation paths array of strings
422 Validation Error HTTPValidationError
I.3.12.2.2. Operation details #

GET /operation/{internal_operation_id}

Description

Returns detailed information about a background operation (scan, dump, restore). Only operations executed with save_dicts enabled are included. Useful for integration purposes.

Operations details request params

Field Type Required Description
internal_operation_id string Yes Internal pg_anon operation ID.

Example

curl -X GET http://127.0.0.1:8000/operation/c6c98133-856f-46b3-ba9e-3a0092b8d9aa

Responses

Status Code Description Component
200 Operation details OperationDataResponse
404 Operation directory not found HTTPValidationError
422 Validation Error HTTPValidationError
I.3.12.2.3. Delete operation data #

DELETE /operation/{internal_operation_id}

Description

Deletes the operation data directory in /runs. Also removes the dump directory from the output path if the operation type is dump.

Example

curl -X DELETE http://127.0.0.1:8000/operation/c6c98133-856f-46b3-ba9e-3a0092b8d9aa

Delete operation data request params

Field Type Required Description
internal_operation_id string Yes Internal pg_anon operation ID.

Responses

Status Code Description Component
204 Operation data successfully deleted -
400 Bad Request ErrorResponse
422 Validation Error HTTPValidationError
500 Internal Server Error ErrorResponse
I.3.12.2.4. Operation logs #

GET /operation/{internal_operation_id}/logs

Description

Returns log output for a background operation (scan, dump, restore). Only operations executed with save_dicts enabled are included. Useful for integration purposes.

Operations logs request body schema

Name Type Required Description
internal_operation_id string Yes Internal pg_anon operation ID.
tail_lines integer No Number of log lines to read from the end of the file. Default: 1000

Example

curl -X GET http://127.0.0.1:8000/operation/c6c98133-856f-46b3-ba9e-3a0092b8d9aa/logs

Responses

Status Code Description Component
200 Successful Response array of strings
422 Validation Error HTTPValidationError

I.3.13. General schemas #

I.3.13.1. DbConnectionParams #

Field Type Required Description
host string Yes Database host.
port integer Yes Database port.
db_name string Yes Database name.
user_login string Yes Database user.
user_password string Yes Database user password.

I.3.13.2. DictionaryContent #

Field Type Required Description
name string Yes Dictionary name. For example can be used as dictionary filename
content string Yes Dictionary content that using for operations processing
additional_info Any No Extra data for integration purposes. Will be sent on webhook “as is”.

I.3.13.3. DictionaryMetadata #

Field Type Required Description
name string Yes Dictionary name. For example can be used as dictionary filename
additional_info Any No Extra data for integration purposes. Will be sent on webhook “as is”.

I.3.13.4. OperationDataResponse #

Field Type Required Description
run_status RunStatus Yes Operation status.
run_options JSON Yes Snapshot of the operation’s runtime options. Useful for analysis, debugging, and rerunning the operation.
dictionaries DictionariesData Yes Used and resulted dictionary contents by types.
extra_data JSON No For dump operations contains dump size info. In other cases is empty.

I.3.13.5. RunStatus #

Field Type Required Description
status_id integer Yes Numeric status code. Possible values: 2 — success, 3 — error.
status string Yes Human-readable status. Possible values: success, error.
started string No Operation start timestamp in ISO8601 format (UTC+0).
ended string No Operation end timestamp in ISO8601 format (UTC+0).

I.3.13.6. DictionariesData #

Field Type Required Description
meta_dict_files array of dictionary content No Contents of the input meta dictionary.
prepared_sens_dict_files array of dictionary content No Contents of the input sensitive dictionary.
prepared_no_sens_dict_files array of dictionary content No Contents of the input non-sensitive dictionary.
partial_tables_dict_files array of dictionary content No Contents of the input tables dictionary.
partial_tables_exclude_dict_files array of dictionary content No Contents of the input tables dictionary.
output_sens_dict_file dictionary content No Contents of the output sensitive dictionary.
output_no_sens_dict_file dictionary content No Contents of the output non-Sensitive dictionary.

I.3.13.7. ViewDataResponse #

Field Type Required Description
status_id integer Yes Integer code of operation status. Can be: 2 — success, 3 — error
status string Yes Human readable operation status. Can be: success, error
content ViewDataContent No Operation result data

I.3.13.8. ViewDataContent #

Field Type Required Description
schema_name string Yes Schema name
table_name string Yes Table name
field_names array Yes Table field names. It needs for rendering table header
total_rows_count integer Yes Total rows count in table. It useful for pagination
rows_before array Yes Source rows data “as is” without anonymization
rows_after array Yes Anonymized rows, for display how anonymization will work on source data

I.3.13.9. ViewFieldsResponse #

Field Type Required Description
status_id integer Yes Integer code of operation status. Can be: 2 — success, 3 — error
status string Yes Human readable operation status. Can be: success, error
content ViewFieldsContent No Operation result data

I.3.13.10. ViewFieldsContent #

Field Type Required Description
schema_name string Yes Schema name
table_name string Yes Table name
field_name string Yes Field name
type string Yes Field data type
dict_data DictionaryMetadata No Matched dictionary metadata containing anonymization rule
rule str No Matched anonymization rule if field is sensitive

I.3.13.11. PreviewSchemasResponse #

FieldTypeRequiredDescription
status_idintegerYesInteger code of operation status. Can be: 2 - success, 3 - error
statusstringYesHuman readable operation status. Can be: success, error
contentarray of stringNoList of schema names

I.3.13.12. PreviewSchemaTablesResponse #

FieldTypeRequiredDescription
status_idintegerYesInteger code of operation status. Can be: 2 - success, 3 - error
statusstringYesHuman readable operation status. Can be: success, error
contentarray of PreviewTableContentNoList of tables with sensitivity info

I.3.13.13. PreviewTableContent #

FieldTypeRequiredDescription
table_namestringYesTable name
is_sensitivebooleanYesWhether the table matches the sensitive dictionary
is_excludedbooleanYesWhether the table is excluded by dictionary_exclude
fieldsarray of PreviewFieldContentNoList of fields with their types and sensitivity info

I.3.13.14. PreviewFieldContent #

FieldTypeRequiredDescription
field_namestringYesField name
typestringYesField data type
is_sensitivebooleanYesWhether the field matches an anonymization rule
rulestringNoMatched anonymization rule if field is sensitive

I.3.13.15. ErrorResponse #

Field Type Required Description
error_codestringYesMachine-readable error code for i18n mapping.
messagestringYesHuman-readable error message.

I.3.13.16. HTTPValidationError #

Field Type Required Description
detail array Yes Error details.

I.3.13.17. ValidationError #

Field Type Required Description
loc array Yes Wrong parameter.
msg string Yes Error message.
type string Yes Error type.