I.3. pg_anon#
I.3. pg_anon #
pg_anon — anonymization tool for Tantor SE
I.3.1. Overview #
Version: 1.9.1
pg_anon helps you safely clone your production database for testing or development.
During the process, all sensitive fields are replaced with realistic but fake values —
keeping your data structure intact and your privacy protected.
I.3.2. Requirements #
Python: 3.11+
Tantor SE client utilities (must match the server’s major version):
pg_dump– uses for export the database schemapg_restore– uses for restore the database schema into the target database
For details, see: Installation and configuring
I.3.3. Terminology #
| Term | Description |
|---|---|
| Personal (sensitive) data | Data that must not be shared with third parties. Includes personal or confidential business information. |
| Source database | The original database that contains sensitive data. |
| Target database | An empty database where anonymized data will be restored. |
| Meta-dictionary | A Python file describing rules for detecting sensitive data. Created manually and used as the basis for generating the sensitive dictionary during scanning. See more |
| Prepared sensitive dictionary | A Python file that defines which tables and fields contain sensitive data and how to anonymize them. Created automatically or manually. See more |
| Prepared non-sensitive dictionary | A Python file listing schemas, tables, and fields without sensitive data. Used to speed up repeated scans. See more |
| Table dictionary | A Python file listing tables. Used to include or exclude tables from dump & restore operations. See more |
| Create-dict (scan) | The process of scanning the source database to detect sensitive fields and create dictionary files. See more |
| Dump | Exporting data from the source database into files using a dictionary. This is where anonymization occurs. See more |
| Restore | Importing anonymized data from files into the target database. See more |
| Anonymization (masking) |
Full process of cloning and sanitizing data
(dump → restore), replacing sensitive
values with random or hashed ones.
|
| Anonymization function |
A Tantor SE function (built-in or from
anon_funcs schema) that replaces
sensitive values with random or hashed data. New functions
can be added to extend anonymization logic.
|
I.3.4. Quick Start #
I.3.4.1. Before you start #
In this guide, a privileged user will be created and test databases with data will be set up.
It is recommended to follow this quick start guide in a non-production environment.
I.3.4.1.1. Prerequisites #
A working Tantor SE instance
PostgreSQL client utilities installed
Note
If running as not superuser, ensure you can install **pgcrypto** extension in your quickstart source database.
I.3.4.2. Preparing pg_anon #
git clone https://github.com/TantorLabs/pg_anon.git pg_anon cd pg_anon virtualenv venv source venv/bin/activate pip install -r requirements.txt python3 -m pg_anon --version
I.3.4.3. Preparing DB environment #
Create a DB user for the quick start guide
sudo su - postgres -c "psql -p 5432 -U postgres -c \"CREATE USER anon_test_user WITH PASSWORD 'mYy5RexGsZ' SUPERUSER;\""
Prepare the SQL script to initialize the databases
cat > /tmp/db_env.sql << 'EOL' DROP DATABASE IF EXISTS pg_anon_quick_start_source_db; CREATE DATABASE pg_anon_quick_start_source_db WITH OWNER = anon_test_user ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' template = template0; DROP DATABASE IF EXISTS pg_anon_quick_start_target_db; CREATE DATABASE pg_anon_quick_start_target_db WITH OWNER = anon_test_user ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' template = template0; EOL
Initialize source and target databases
sudo chown postgres:postgres /tmp/db_env.sql sudo su - postgres -c "psql -p 5432 -U postgres -f /tmp/db_env.sql"
Load test environment into source DB
cp $(pwd)/tests/sql/init_env.sql /tmp/init_env.sql sudo chown postgres:postgres /tmp/init_env.sql sudo su - postgres -c "psql -p 5432 -d pg_anon_quick_start_source_db -U postgres -f /tmp/init_env.sql"
I.3.4.4. Initializing the service schema for pg_anon #
python3 -m pg_anon init \
--db-user=anon_test_user \
--db-user-password=mYy5RexGsZ \
--db-name=pg_anon_quick_start_source_db \
--db-host=127.0.0.1 \
--db-port=5432
I.3.4.5. Scan your source database #
python3 -m pg_anon create-dict \
--db-user=anon_test_user \
--db-user-password=mYy5RexGsZ \
--db-name=pg_anon_quick_start_source_db \
--db-host=127.0.0.1 \
--db-port=5432 \
--meta-dict-file=tests/input_dict/test_meta_dict.py \
--output-sens-dict-file=test_sens_dict_output.py \
--output-no-sens-dict-file=test_no_sens_dict_output.py \
--processes=2
I.3.4.6. Visualizing anonymization rules #
Run pg_anon in view-fields mode to see which
fields will be anonymized and which fields will be dumped as-is.
python3 -m pg_anon view-fields \
--db-host=127.0.0.1 \
--db-user=anon_test_user \
--db-user-password=mYy5RexGsZ \
--db-name=pg_anon_quick_start_source_db \
--db-port=5432 \
--prepared-sens-dict-file=test_sens_dict_output.py \
--fields-count=20
Run pg_anon in view-data mode to preview
anonymized data in a specific table.
python3 -m pg_anon view-data \
--db-host=127.0.0.1 \
--db-user=anon_test_user \
--db-user-password=mYy5RexGsZ \
--db-name=pg_anon_quick_start_source_db \
--db-port=5432 \
--prepared-sens-dict-file=test_sens_dict_output.py \
--schema-name=public \
--table-name=contracts \
--limit=10 \
--offset=0
I.3.4.7. Create an anonymized backup #
python3 -m pg_anon dump \
--db-host=127.0.0.1 \
--db-user=anon_test_user \
--db-user-password=mYy5RexGsZ \
--db-name=pg_anon_quick_start_source_db \
--db-port=5432 \
--prepared-sens-dict-file=test_sens_dict_output.py \
--output-dir=/tmp/quick_start_dump \
--clear-output-dir
I.3.4.8. Restore the anonymized backup into the target DB #
python3 -m pg_anon restore \
--db-host=127.0.0.1 \
--db-port=5432 \
--db-user=anon_test_user \
--db-user-password=mYy5RexGsZ \
--db-name=pg_anon_quick_start_target_db \
--input-dir=/tmp/quick_start_dump \
--drop-custom-check-constr \
--verbose=debug
I.3.5. Installation & Configuration #
You can use several options for installing pg_anon:
I.3.5.1. Before you install #
pg_anon provides 2 ways to run: CLI and REST API
The REST API service is optional to install. This service is
designed to integrate pg_anon functionality
into any system or pipelines via HTTP requests. It works just as
a thin wrapper around the CLI version of
pg_anon. REST API calls prepare CLI
parameters and run the CLI version of pg_anon in the background.
It doesn’t keep state or store data in a database, so it can be scaled easily without extra setup.
However, this means that the system that integrates pg_anon must implement its own storage for dictionaries, dump tasks, and restore tasks.
Note
Not suitable for fully autonomous operation.
All operation runs logs and info will be stored in the
directory /path_to_pg_anon/runs. All dumps
will be stored in the directory
/path_to_pg_anon/output. If the REST API
service is scaled, you must create a symlink to this directory
on a shared disk. This is required because restore operations
also read dumps from
/path_to_pg_anon/output.
I.3.5.2. Using db_extension_installer.sh locally from the downloaded package #
Download the
pg_anonpackage suitable for your OS from nexus-public.Download the installation script
db_extension_installer.shand make it executable:wget https://public.tantorlabs.ru/db_extension_installer.sh && \ chmod +x db_extension_installer.sh
Run the
pg_anoninstallation usingdb_extension_installer.shfrom the downloaded package:./db_extension_installer.sh \ --from-file=<path to pg_anon package>
I.3.5.3. Using db_extension_installer.sh automatically from the package repository #
Download the installation script
db_extension_installer.shand make it executable:wget https://public.tantorlabs.ru/db_extension_installer.sh && \ chmod +x db_extension_installer.sh
Set the value of the environment variable
NEXUS_URL:export NEXUS_URL="nexus-public.tantorlabs.ru"
Run the
pg_anoninstallation usingdb_extension_installer.shwith the following parameters:./db_extension_installer.sh \ --database-type=tantor \ --edition=all \ --extension=pg_anon
I.3.5.4. From the GitHub repository (Python) #
Install Python 3 if it is not installed:
sudo apt-get install python3.11(for Ubuntu)Clone the repository:
git clone https://github.com/TantorLabs/pg_anon.gitGo to the project directory:
cd pg_anonSet up a virtual environment:
Install the virtual environment:
python3 -m venv venvActivate the virtual environment:
source venv/bin/activate
Install the dependencies:
pip install -r requirements.txt(Optional) For use the REST API service, install its dependencies:
pip install -r rest_api/requirements.txt
I.3.5.5. Configuring pg_anon #
To specify custom pg_dump and
pg_restore utilities, use the
--pg-dump and --pg-restore
parameters.
Advanced configuration is also available:
CLI - use run parameter
--configREST API - config must be placed at
/path_to_pg_anon/config.yml
pg-utils-versions:
<postgres_major_version>:
pg_dump: "/path/to/<postgres_major_version>/pg_dump"
pg_restore: "/path/to/<postgres_major_version>/pg_restore"
<another_postgres_major_version>:
pg_dump: "/path/to/<postgres_major_version>/pg_dump"
pg_restore: "/path/to/<postgres_major_version>/pg_restore"
default:
pg_dump: "/path/to/default_postgres_version/pg_dump"
pg_restore: "/path/to/default_postgres_version/pg_restore"
For example, you can specify a configuration for Tantor SE 15 and 17:
pg-utils-versions:
15:
pg_dump: "/opt/tantor/db/15/bin/pg_dump"
pg_restore: "/opt/tantor/db/15/bin/pg_restore"
17:
pg_dump: "/opt/tantor/db/17/bin/pg_dump"
pg_restore: "/opt/tantor/db/17/bin/pg_restore"
default:
pg_dump: "/opt/tantor/db/17/bin/pg_dump"
pg_restore: "/opt/tantor/db/17/bin/pg_restore"
If the current Tantor SE version does not match any version in
this config, the utilities from the default section will be
used. For example, pg_anon can be run with
this config on Tantor SE 16. In this case,
pg_dump 17 and
pg_restore 17 will be used.
I.3.5.6. Running REST API #
Run service command:
python -m uvicorn rest_api.api:app --host 0.0.0.0 --port 8000 --workers=3
Recommended worker count =
2 * CPU_CORES + 1Service OpenAPI documentation will be able by address -
http://0.0.0.0:8000/docs#/Also you can see API documentation
I.3.6. How it works #
I.3.6.1. Anonymization (masking) #
The diagram below illustrates how data is transferred from the source DB to the target DB.
The source database contains sensitive information and is typically located in a production environment with strictly limited access.
Figure I.1. Dump-Resore-Terms.drawio.png

A trusted administrator runs pg_anon with credentials for the source DB. Using the prepared and approved sensitive dictionary, pg_anon creates an anonymized dump in the specified directory. The dictionary must be created in advance and validated by the security team.
The resulting dump directory is then transferred to the host of the target database. Compression during transfer is unnecessary because the dump files are already compressed.
Once the directory is placed on the target host, the restore process is started using target database credentials. The target database must be created beforehand using CREATE DATABASE and must be empty.
After a successful restore, the anonymized database is ready for development or testing. Any number of employees can safely use it without risking exposure of sensitive data.
I.3.6.2. What kind of work does pg_anon do inside during dump and restore? The simplest representation. #
I.3.6.2.1. For example, we have data that we want to anonymize: #
Create the
sourcetable:
create table users (
id bigserial,
email text,
login text
);
-- Checking the contents of the source table
select * from users;
>>
id | email | login
----+---------+-------
Populating the
sourcetable:
insert into users (email, login) select 'user' || generate_series(1001, 1020) || '@example.com', 'user' || generate_series(1001, 1020); -- Checking the contents of the source table select * from users;
>>
id | email | login
----+----------------------+----------
1 | user1001@example.com | user1001
2 | user1002@example.com | user1002
...
The ‘email’ field contains
sensitive data. We need to
anonymize it.
I.3.6.2.2. What is the process of creating a dump with masking? #
Data
dumpfrom thesourcetable to a CSV file (without masking):
copy (
select *
from users
) to '/tmp/users.csv' with csv;
cat /tmp/users.csv >> 1,user1001@example.com,user1001 2,user1002@example.com,user1002 ...
Maskingthe contents of thesourcetable:
select id, md5(email) || '@abc.com' as email, -- hashing the email (masking rule in prepared sens dict file) login from users;
>>
id | email | login
----+------------------------------------------+----------
1 | 385513d80895c4c5e19c91d1df9eacae@abc.com | user1001
2 | 9f4c0c30f85b0353c4d5fe3c9cc633e3@abc.com | user1002
...
Data
dumpfrom thesourcetable to a CSV file (withmasking):
copy (
select
id,
md5(email) || '@abc.com' as email, -- hashing the email (masking rule in prepared sens dict file)
login
from users
) to '/tmp/users_anonymized.csv' with csv;
cat /tmp/users_anonymized.csv >> 1,385513d80895c4c5e19c91d1df9eacae@abc.com,user1001 2,9f4c0c30f85b0353c4d5fe3c9cc633e3@abc.com,user1002 ...
The
prepared sens dict file contains masking
rules like hashing
I.3.6.2.3. What is the process for restoring a masked dump? #
Reproducing of the structure. Creating the
targettable:
create table users_anonymized (
id bigserial,
email text,
login text
);
-- Checking the contents of the target table
select * from users_anonymized;
>>
id | email | login
----+---------+-------
Loading data from the
sourcetable datadump(CSV file) totargettable:
copy users_anonymized from '/tmp/users_anonymized.csv' with csv; -- Checking the contents of the target table select * from users_anonymized;
>>
id | email | login
----+------------------------------------------+----------
1 | 385513d80895c4c5e19c91d1df9eacae@abc.com | user1001
2 | 9f4c0c30f85b0353c4d5fe3c9cc633e3@abc.com | user1002
...
I.3.6.2.4. Differences between original work of pg_anon and that representation #
pg_anonoperates on the entire database (not only one table)pg_anonuses.bin.gzfiles to save data (not csv)Masking rules are provided to
pg_anonvia aprepared sens dict file
I.3.7. Operations #
I.3.7.1. Init #
This mode creates the anon_funcs schema in
the source database and loads the predefined SQL functions from
init.sql. These
functions are required for processing data in the source
database.
I.3.7.1.1. Run example #
python -m pg_anon init \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db
I.3.7.1.2. Options #
Common pg_anon options
| Option | Required | Description |
|---|---|---|
--verbose
| No |
Sets
the log verbosity level: info,
debug, error.
(default: info)
|
--debug
| No |
Enables
debug mode (equivalent to
--verbose=debug) and adds extra
debug logs. (default: false)
|
Database configuration options
| Option | Required | Description |
|---|---|---|
--db-host
| Yes | Database host. |
--db-port
| No | Database port. |
--db-name
| Yes | Database name. |
--db-user
| Yes | Database user. |
--db-user-password
| No | Database user password. |
--db-passfile
| No | Path to a file containing the password used for authentication. |
--db-ssl-key-file
| No | Path to the client SSL key file for secure connections. |
--db-ssl-cert-file
| No | Path to the client SSL certificate file. |
--db-ssl-ca-file
| No | Path to the CA certificate used to verify the server’s certificate. |
I.3.7.2. Scan #
The scan operation analyzes your Tantor SE database to detect potentially sensitive data and generate dictionaries files. It used for dump and repeat scan.
I.3.7.2.1. Prerequisites #
Manually created meta-dictionary
Already run
initmode for source database
I.3.7.2.2. Usage #
To scan source database and create dictionary for dump, run
pg_anon in create-dict mode. You need:
meta-dictionary file with scan rules.
python pg_anon.py create-dict \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_source_db \
--meta-dict-file=test_meta_dict.py \
--prepared-sens-dict-file=test_sens_dict_output_previous_use.py \
--prepared-no-sens-dict-file=test_no_sens_dict_output_previous_use.py \
--output-sens-dict-file=test_sens_dict_output.py \
--output-no-sens-dict-file=test_no_sens_dict_output.py \
--processes=2
I.3.7.2.3. Options #
Common pg_anon options
| Option | Required | Description |
|---|---|---|
--config
| No |
Path to
the config file that can specify
pg_dump and
pg_restore utilities. (default:
none)
|
--processes
| No | Number of processes used for multiprocessing operations. (default: 4) |
--db-connections-per-process
| No | Number of database connections per process for I/O operations. (default: 4) |
--verbose
| No |
Sets
the log verbosity level: info,
debug, error.
(default: info)
|
--debug
| No |
Enables
debug mode (equivalent to
--verbose=debug) and adds extra
debug logs. (default: false)
|
Database configuration options
| Option | Required | Description |
|---|---|---|
--db-host
| Yes | Database host. |
--db-port
| No | Database port. |
--db-name
| Yes | Database name. |
--db-user
| Yes | Database user. |
--db-user-password
| No | Database user password. |
--db-passfile
| No | Path to a file containing the password used for authentication. |
--db-ssl-key-file
| No | Path to the client SSL key file for secure connections. |
--db-ssl-cert-file
| No | Path to the client SSL certificate file. |
--db-ssl-ca-file
| No | Path to the CA certificate used to verify the server’s certificate. |
Create-dict (scan) mode options
| Option | Required | Description |
|---|---|---|
--meta-dict-file
| Yes | Input file or file list contains meta-dictionary, which was prepared manually. In rules collision case, priority has rules in last file from the list |
--prepared-sens-dict-file
| No |
Input file or file list contains sensitive dictionary,
which was obtained in previous use by option
--output-sens-dict-file or prepared manually.
In rules collision case, priority has rules in last file from the list
|
--prepared-no-sens-dict-file
| No |
Input file or file list contains not sensitive dictionary,
which was obtained in previous use by option
--output-no-sens-dict-file or prepared manually.
In rules collision case, priority has rules in last file from the list
|
--output-sens-dict-file
| Yes | Output file path for saving sensitive dictionary |
--output-no-sens-dict-file
| No | Output file path for saving not sensitive dictionary |
--scan-mode
| No | Defines whether to scan all data or only part of it [“full”, “partial”] (default “partial”) |
--scan-partial-rows
| No |
In --scan-mode partial defines
amount of rows to scan (default 10000). Actual rows
count count can be smaller after getting unique values
|
--save-dicts
| No |
Duplicate all input and output dictionaries to dir
runs. It can be useful for debugging
or integration purposes.
|
I.3.7.3. Dump #
This mode creates an anonymized backup using rules from the sensitive dictionary.
Note
This backup can only be restored
using pg_anon and
cannot be restored with
pg_restore
I.3.7.3.1. Prerequisites #
The
anon_funcsschema with anonymization functions must already exist. See init mode.A sensitive dictionary containing data about database fields and their anonymization rules must be prepared beforehand. See create-dict (scan) mode.
I.3.7.3.2. Full dump (dump) mode #
Creates a backup containing both the database structure and anonymized data.
This backup can be restored using the following modes:
Run example
python pg_anon.py dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--prepared-sens-dict-file=sens_dict.py
I.3.7.3.3. Structure dump (sync-struct-dump)
mode #
Creates a backup containing only the database structure without anonymized data.
This backup can be restored in this mode:
This mode is useful when used together with the
data dump (sync-data-dump) mode.
Run example
python pg_anon.py sync-struct-dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--output-dir=test_sync_struct_dump \
--prepared-sens-dict-file=sens_dict.py
I.3.7.3.4. Data dump (sync-data-dump) mode #
Create backup contains only anonymized data without database structure.
This backup can be restored in this mode:
This mode can be useful for scheduling database
synchronization, for example using cron.
Run example
python pg_anon.py sync-data-dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--output-dir=test_sync_data_dump \
--prepared-sens-dict-file=sens_dict.py
I.3.7.3.5. Create partial dump #
Partial dumps are used to create a backup excluding certain tables from the source database.
Partial dump can be run in all dump modes:
Partial dumps use a tables dictionary containing a list of tables. This dictionary can act as either a whitelist or a blacklist. See tables dictionary.
Run example
Dump only need tables (whitelist)
python pg_anon.py dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--output-dir=partial_dump_white_list \
--prepared-sens-dict-file=sens_dict.py
--partial-tables-dict-file=include_tables.py
Dump all tables without some specified tables (blacklist)
python pg_anon.py dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--output-dir=partial_dump_black_list \
--prepared-sens-dict-file=sens_dict.py
--partial-tables-exclude-dict-file=exclude_tables.py
Dump only specified tables with excluding some of them (whitelist + blacklist)
python pg_anon.py dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--output-dir=partial_dump_white_list_and_black_list \
--prepared-sens-dict-file=sens_dict.py
--partial-tables-dict-file=include_tables.py
--partial-tables-exclude-dict-file=exclude_tables.py
I.3.7.3.6. Options #
Common pg_anon options
| Option | Required | Description |
|---|---|---|
--config
| No |
Path to
the config file that can specify
pg_dump and
pg_restore utilities. (default:
none)
|
--db-connections-per-process
| No | Number of database connections per process for I/O operations. (default: 4) |
--verbose
| No |
Sets
the log verbosity level: info,
debug, error.
(default: info)
|
--debug
| No |
Enables
debug mode (equivalent to
--verbose=debug) and adds extra
debug logs. (default: false)
|
Database configuration options
| Option | Required | Description |
|---|---|---|
--db-host
| Yes | Database host. |
--db-port
| No | Database port. |
--db-name
| Yes | Database name. |
--db-user
| Yes | Database user. |
--db-user-password
| No | Database user password. |
--db-passfile
| No | Path to a file containing the password used for authentication. |
--db-ssl-key-file
| No | Path to the client SSL key file for secure connections. |
--db-ssl-cert-file
| No | Path to the client SSL certificate file. |
--db-ssl-ca-file
| No | Path to the CA certificate used to verify the server’s certificate. |
Dump mode options
| Option | Required | Description |
|---|---|---|
--prepared-sens-dict-file
| Yes | Input file or file list contains sensitive dictionary, which was generated by the create-dict (scan) mode or created manually. In rules collision case, priority has rules in last file from the list |
--partial-tables-dict-file
| No | Input file or file list contains tables dictionary for include specific tables in the dump. All tables not listed in these files will be excluded. These files must be prepared manually (acts as a whitelist) |
--partial-tables-exclude-dict-file
| No | Input file or file list contains tables dictionary for exclude specific tables from the dump. All tables listed in these files will be excluded. These files must be prepared manually (acts as a blacklist) |
--dbg-stage-1-validate-dict
| No | Validate dictionary, show the tables and run SQL queries without data export (default: false) |
--dbg-stage-2-validate-data
| No | Validate data, show the tables and run SQL queries with data export in prepared database (default: false) |
--dbg-stage-3-validate-full
| No | Makes all logic with “limit” in SQL queries (default: false) |
--clear-output-dir
| No | Clears the output directory from previous dumps or other files. (default: false) |
--pg-dump
| No |
Path to
the pg_dump Postgres tool
(default /usr/bin/pg_dump).
|
--pg-dump-options | No | Additional options passed directly to pg_dump utility. Example: "--no-comments --encoding=LATIN1". |
--output-dir
| No | Output directory for dump files. (default "") |
--ignore-privileges
| No | Ignore privileges from source db. |
--save-dicts
| No |
Duplicate all input dictionaries to dir
runs. It can be useful for
debugging or integration purposes.
|
I.3.7.4. Restore #
This mode restores an anonymized backup created using pg_anon in the dump mode.
Note
Only backups created with pg_anon can be
restored. Backups created with pg_dump
cannot be restored.
I.3.7.4.1. Full restore (restore) mode #
Restores both the database structure and data.
Prerequisites
Dump or partial dump created by pg_anon in full dump (
dump) mode.The target database must be empty, or the
--clean-db/--drop-dboptions can be used.
Run example
python pg_anon.py restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=target_db \
--input-dir=path/to/my_full_dump \
--verbose=debug
I.3.7.4.2. Structure restore
(sync-struct-restore) mode #
Restores only the database structure.
Prerequisites
Dump or partial dump created in modes:
The target database must be empty, or the
--clean-db/--drop-dboptions can be used.
Run example
python pg_anon.py sync-struct-restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=target_db \
--input-dir=path/to/my_sync_struct_dump \
--verbose=debug
I.3.7.4.3. Data restore (sync-data-restore)
mode #
Restores data only.
Prerequisites
Dump or partial dump created in modes:
The target database must already contain the required schema for restoring data.
Run example
python pg_anon.py sync-data-restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=target_db \
--input-dir=path/to/my_sync_data_dump \
--verbose=debug
I.3.7.4.4. Create partial restore #
Partial restores are used to restore only part of a backup, excluding certain tables if needed.
Partial restore can be run in all restore modes:
Partial restores use a tables dictionary containing a list of tables. This dictionary can act as either a whitelist (only listed tables are restored) or a blacklist (listed tables are excluded). See tables dictionary.
Run example
Restore only need tables (whitelist)
python pg_anon.py restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--input-dir=partial_dump_white_list \
--partial-tables-dict-file=include_tables.py
Dump all tables without some specified tables (blacklist)
python pg_anon.py restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--input-dir=partial_dump_black_list \
--partial-tables-exclude-dict-file=exclude_tables.py
Dump only specified tables with excluding some of them (whitelist + blacklist)
python pg_anon.py restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--input-dir=partial_dump_white_list_and_black_list \
--partial-tables-dict-file=include_tables.py
--partial-tables-exclude-dict-file=exclude_tables.py
I.3.7.4.5. Options #
Common pg_anon options
| Option | Required | Description |
|---|---|---|
--config
| No |
Path to
the config file that can specify
pg_dump and
pg_restore utilities. (default:
none)
|
--processes
| No | Number of processes used for multiprocessing operations. (default: 4) |
--db-connections-per-process
| No | Number of database connections per process for I/O operations. (default: 4) |
--verbose
| No |
Sets
the log verbosity level: info,
debug, error.
(default: info)
|
--debug
| No |
Enables
debug mode (equivalent to
--verbose=debug) and adds extra
debug logs. (default: false)
|
Database configuration options
| Option | Required | Description |
|---|---|---|
--db-host
| Yes | Database host. |
--db-port
| No | Database port. |
--db-name
| Yes | Database name. |
--db-user
| Yes | Database user. |
--db-user-password
| No | Database user password. |
--db-passfile
| No | Path to a file containing the password used for authentication. |
--db-ssl-key-file
| No | Path to the client SSL key file for secure connections. |
--db-ssl-cert-file
| No | Path to the client SSL certificate file. |
--db-ssl-ca-file
| No | Path to the CA certificate used to verify the server’s certificate. |
Restore mode options
| Option | Required | Description |
|---|---|---|
--input-dir
| Yes | Path to the directory containing dump files created in dump mode. |
--partial-tables-dict-file
| No | Input Input file or file list contains tables dictionary for include specific tables in the dump. All tables not listed in these files will be excluded. These files must be prepared manually (acts as a whitelist). |
--partial-tables-exclude-dict-file
| No | Input Input file or file list contains tables dictionary for exclude specific tables from the dump. All tables listed in these files will be excluded. These files must be prepared manually (acts as a blacklist) |
--disable-checks
| No | Disable checks of disk space and Tantor SE version (default false) |
--seq-init-by-max-value
| No | Initialize sequences based on maximum values. Otherwise, the sequences will be initialized based on the values of the source database. |
--drop-custom-check-constr
| No | Drops all CHECK constraints that contain user-defined procedures to avoid performance degradation during data loading. |
--pg-restore
| No |
Path to
the pg_restore Postgres tool.
|
--pg-restore-options | No | Additional options passed directly to pg_restore utility. Example: "--no-comments --no-table-access-method". |
--clean-db
| No |
Cleans
the database objects before restoring (if they exist
in the dump). Mutually exclusive with
--drop-db.
|
--drop-db
| No |
Drop
target database before restore. Mutually exclusive
with --clean-db.
|
--ignore-privileges
| No | Ignore privileges from source db. |
--save-dicts
| No |
Duplicate all input dictionaries to dir
runs. It can be useful for debugging
or integration purposes.
|
I.3.7.5. View Fields #
This mode displays how database fields match the anonymization rules.
I.3.7.5.1. Prerequisites #
The
anon_funcsschema with anonymization functions must already exist. See init mode.A sensitive dictionary containing data about database fields and their anonymization rules must be prepared beforehand. See create-dict (scan) mode.
I.3.7.5.2. Run example #
python pg_anon.py view-fields \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--prepared-sens-dict-file=sens_dict.py
Note
This mode can process only a limited number of fields when no filters are applied, for performance reasons.
This limit is controlled by the
--fields-count option (default: 5000
fields). To avoid hitting this limit, increase the
--fields-count value or use filter
options: --schema-name,
--schema-mask,
--table-name,
--table-mask.
I.3.7.5.3. Options #
Common pg_anon options
| Option | Required | Description |
|---|---|---|
--config
| No |
Path to
the config file that can specify
pg_dump and
pg_restore utilities. (default:
none)
|
--processes
| No | Number of processes used for multiprocessing operations. (default: 4) |
--db-connections-per-process
| No | Number of database connections per process for I/O operations. (default: 4) |
--verbose
| No |
Sets
the log verbosity level: info,
debug, error.
(default: info)
|
--debug
| No |
Enables
debug mode (equivalent to
--verbose=debug) and adds extra
debug logs. (default: false)
|
Database configuration options
| Option | Required | Description |
|---|---|---|
--db-host | Yes | Database host. |
--db-port | No | Database port. |
--db-name | Yes | Database name. |
--db-user | Yes | Database user. |
--db-user-password | No | Database user password. |
--db-passfile | No | Path to a file containing the password used for authentication. |
--db-ssl-key-file | No | Path to the client SSL key file for secure connections. |
--db-ssl-cert-file | No | Path to the client SSL certificate file. |
--db-ssl-ca-file | No | Path to the CA certificate used to verify the server’s certificate. |
View-fields mode options
| Option | Required | Description |
|---|---|---|
--prepared-sens-dict-file
| Yes | Path to Input file or file list contains sensitive dictionary, which was generated by the create-dict (scan) mode or created manually. In rules collision case, priority has rules in last file from the list. |
--view-only-sensitive-fields
| No | Displays only sensitive fields. (default: all fields) |
--fields-count
| No | Maximum number of fields to process for output (default: 5000) |
--schema-name
| No | Filter by schema name |
--schema-mask
| No | Filter by schema name using a regular expression |
--table-name
| No | Filter by table name |
--table-mask
| No | Filter by table name using a regular expression |
--json
| No | Outputs results in JSON format instead of a table. |
I.3.7.6. View Data #
This mode displays anonymized table data without creating a dump.
I.3.7.6.1. Prerequisites #
The
anon_funcsschema with anonymization functions must already exist. See init mode.A sensitive dictionary containing data about database fields and their anonymization rules must be prepared beforehand. See create-dict (scan) mode.
I.3.7.6.2. Run example #
python pg_anon.py view-data \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=source_db \
--prepared-sens-dict-file=sens_dict.py \
--schema-name=public \
--table-name=users \
--limit=10 \
--offset=0
I.3.7.6.3. Options #
Common pg_anon options
| Option | Required | Description |
|---|---|---|
--config
| No |
Path to
the config file that can specify
pg_dump and
pg_restore utilities. (default:
none)
|
--processes
| No | Number of processes used for multiprocessing operations. (default: 4) |
--db-connections-per-process
| No | Number of database connections per process for I/O operations. (default: 4) |
--verbose
| No |
Sets
the log verbosity level: info,
debug, error.
(default: info)
|
--debug
| No |
Enables
debug mode (equivalent to
--verbose=debug) and adds extra
debug logs. (default: false)
|
Database configuration options
| Option | Required | Description |
|---|---|---|
--db-host
| Yes | Database host. |
--db-port
| No | Database port. |
--db-name
| Yes | Database name. |
--db-user
| Yes | Database user. |
--db-user-password
| No | Database user password. |
--db-passfile
| No | Path to a file containing the password used for authentication. |
--db-ssl-key-file
| No | Path to the client SSL key file for secure connections. |
--db-ssl-cert-file
| No | Path to the client SSL certificate file. |
--db-ssl-ca-file
| No | Path to the CA certificate used to verify the server’s certificate. |
View-data mode options
| Option | Required | Description |
|---|---|---|
--prepared-sens-dict-file
| Yes | Path to Input file or file list contains sensitive dictionary, which was generated by the create-dict (scan) mode or created manually. In rules collision case, priority has rules in last file from the list. |
--schema-name
| Yes | Schema name. |
--table-name
| Yes | Table name. |
--limit
| No | Number of rows to display. (default: 100) |
--offset
| No | Row offset for pagination. (default: 0) |
--json
| No | Outputs results in JSON format instead of a table. |
I.3.8. Dictionary Schemas #
I.3.8.1. Meta Dictionary #
The meta-dictionary defines the rules used by pg_anon to detect sensitive fields during the create-dict (scan) mode
This dictionary is not generated automatically — it must be created manually. It provides powerful configuration options that let you fine-tune scanning behavior.
During scanning, pg_anon analyzes all database schemas, tables, and fields.
The meta-dictionary allows you to:
include or exclude specific objects
detect sensitive fields by:
field names
data patterns (regex)
exact or partial constants
custom Python functions
SQL filtering conditions
select which PostgreSQL types should be scanned
specify anonymization functions per data type
To make it easier to navigate, each section of the meta-dictionary is described separately below, with purpose, schema, and example.
At the end, a complete combined schema is shown.
I.3.8.1.1. Scan priority #
| Priority | Meta-dictionary section | Required | Description |
|---|---|---|---|
| 1 | skip_rules + include_rules | No | Narrow the set of fields to be scanned |
| 2 | Fields from sensitive dictionary + field + fields from non-sensitive Dictionary | No | Match by field names. Does not inspect data inside tables |
| 3 | sens_pg_types | No | Match by field type. Types not included in this section are ignored |
| 4 | data_sql_condition | No | Narrow the dataset to be scanned for specific tables using SQL conditions |
| 5 | data_func | No | Match data by custom data-scanning functions |
| 6 | data_const | No | Match data by constant values |
| 7 | data_regex | No | Match data by regular expressions |
| 8 | funcs | No | Assign anonymization functions according to detected data type |
Figure I.2. scan_workflow.png

I.3.8.1.2. Section: skip_rules #
Purpose
Defines what parts of the database should be excluded from scanning. You can skip entire schemas, specific tables, or individual fields. Useful for large schemas without sensitive data or for reducing processing time.
Schema
{
"skip_rules": [
{
"schema": "<schema_name: string>", # Skip this schema from scanning
"schema_mask": "<schema_regex_mask: string>", # Or skip from scanning schemas matching regex pattern
"table": "<table_name: string>", # Skip from scanning only this table
"table_mask": "<table_regex_mask: string>", # Or skip from scanning tables matching regex pattern
"fields": [ # Skip from scanning fields of tables
"<field_name: string>"
]
}
]
}
Rule Combinations
schemaorschema_mask— requiredYou must specify one of
schemaorschema_maskThis defines where the rule applies.
tableortable_mask— optionalYou may (but don’t have to) narrow the rule to specific tables.
If
tableortable_maskis specified → rule applies only to those tables.If both are omitted → the rule applies to the entire schema.
fields— optionalYou may also specify particular fields to skip.
If fields is specified → only these fields are skipped.
If fields is omitted → the entire table is skipped.
And if no
tableis specified → the entire schema is skipped.
Using this section
Example meta-dictionary
{
"skip_rules": [
{
"schema_msk": "^tmp.*"
},
{
"schema": "ecommerce",
"table_mask": "^client_.*"
},
{
"schema": "public",
"fields": ["currency", "id", "employee_id"]
},
{
"schema": "ecommerce",
"table": "orders",
"fields": ["count"]
}
]
}
Example Tables Structure with following dictionary matches
| Schema | Table | Field | Is skipped |
|---|---|---|---|
| public | employees | id | Yes |
| public | employees | full_name | No |
| public | employees | No | |
| public | employees | password | No |
| public | employees | phone | No |
| public | employees | hire_date | No |
| public | salaries | employee_id | No |
| public | salaries | monthly_salary | No |
| public | salaries | currency | Yes |
| ecommerce | orders | product_id | No |
| ecommerce | orders | client_name | Yes |
| ecommerce | orders | client_phone | Yes |
| ecommerce | orders | client_delivery_address | Yes |
| ecommerce | orders | count | Yes |
| ecommerce | orders | created | No |
| ecommerce | orders | status | No |
| tmp_some_1 | tbl_1 | content | Yes |
| tmp_some_2 | tbl_2 | content | Yes |
I.3.8.1.3. Section: include_rules #
Purpose
Restrict scanning to specific schemas/tables/fields. Useful when:
scanning only part of a large database
debugging detection rules
running focused scans
Schema
{
"include_rules": [
{
"schema": "<schema_name: string>", # Include only this schema
"schema_mask": "<schema_regex_mask: string>", # Or include schemas matching regex pattern
"table": "<table_name: string>", # Include only this table
"table_mask": "<table_regex_mask: string>", # Or include tables matching regex pattern
"fields": [ # Include only this fields of tables for scanning
"<field_name: string>"
]
}
]
}
Combination rules
schemaorschema_mask— requiredYou must specify one of
schemaorschema_maskThis defines where the rule applies.
tableortable_mask— optionalYou may (but don’t have to) narrow the rule to specific tables.
If
tableortable_maskis specified → rule applies only to those tables.If both are omitted → the rule applies to the entire schema.
fields— optionalYou may also specify particular fields to skip.
If fields is specified → only these fields are skipped.
If fields is omitted → the entire table is skipped.
And if no
tableis specified → the entire schema is skipped.
Using this section
Example meta-dictionary
{
"include_rules": [
{
"schema_msk": "^tmp.*"
},
{
"schema": "ecommerce",
"table_mask": "^client_.*"
},
{
"schema": "public",
"fields": ["currency", "id", "employee_id"]
},
{
"schema": "ecommerce",
"table": "orders",
"fields": ["count"]
}
]
}
Example Tables Structure with following dictionary matches
| Schema | Table | Field | Is skipped |
|---|---|---|---|
| public | employees | id | No |
| public | employees | full_name | Yes |
| public | employees | Yes | |
| public | employees | password | Yes |
| public | employees | phone | Yes |
| public | employees | hire_date | Yes |
| public | salaries | employee_id | Yes |
| public | salaries | monthly_salary | Yes |
| public | salaries | currency | No |
| ecommerce | orders | product_id | Yes |
| ecommerce | orders | client_name | No |
| ecommerce | orders | client_phone | No |
| ecommerce | orders | client_delivery_address | No |
| ecommerce | orders | count | No |
| ecommerce | orders | created | Yes |
| ecommerce | orders | status | Yes |
| tmp_some_1 | tbl_1 | content | No |
| tmp_some_2 | tbl_2 | content | No |
I.3.8.1.4. Section: field #
Purpose
Detect sensitive fields based solely on field name, without scanning the data inside.
Useful when already known fields contains sensitive data (e.g., email, password)
Schema
{
"field": {
"rules": [
"<field_name_regex: string>",
],
"constants": [
"<field_name: string>",
]
}
}
| Key | Meaning |
|---|---|
rules
| List of regex patterns. Field names matching these patterns are always considered sensitive. |
constants
| Exact field names to be treated as sensitive. |
Using this section
Example meta-dictionary
{
"field": {
"rules": [
"^client_",
".*phone.*"
],
"constants": [
"password",
"email"
]
}
}
Example Tables Structure with following dictionary matches
| Schema | Table | Field | Is sensitive | Rule |
|---|---|---|---|---|
| public | employees | id | No | - |
| public | employees | full_name | No | - |
| public | employees | Yes | ||
| public | employees | password | Yes | password |
| public | employees | phone | Yes | .*phone.* |
| public | employees | hire_date | No | - |
| public | salaries | employee_id | No | - |
| public | salaries | monthly_salary | No | - |
| public | salaries | currency | No | - |
| ecommerce | orders | product_id | No | - |
| ecommerce | orders | client_name | Yes | ^client_ |
| ecommerce | orders | client_phone | Yes | .*phone.* |
| ecommerce | orders | client_delivery_address | Yes | ^client_ |
| ecommerce | orders | count | No | - |
| ecommerce | orders | created | No | - |
| ecommerce | orders | status | No | - |
I.3.8.1.5. Section: sens_pg_types #
Purpose
Define which PostgreSQL types are scanned. Field types not included in this list are not scanned.
If omitted or empty → default types are used:
text, character,
varchar, mvarchar,
json, integer,
bigint
Schema
{
"sens_pg_types": [
"<field_type: string>",
]
}
Using this section
Example meta-dictionary
{
"sens_pg_types": [
"text",
"integer",
"bigint",
"numeric(10,2)",
"varchar",
"mchar",
"json"
]
}
Field Type Resolution Priorities:
Exact type match (
varchar(20),numeric(10,2))Base type match (
varchar,numeric)Fields with another types are not scanned.
I.3.8.1.6. Section: data_sql_condition #
Purpose
Specify custom SQL WHERE conditions to sample the data instead of scanning the whole table.
Schema
{
"data_sql_condition": [
{
"schema": "<schema_name: string>", # Check only this schema
"schema_mask": "<schema_regex_mask: string>", # Or check schemas matching regex pattern
"table": "<table_name: string>", # Check only this table
"table_mask": "<table_regex_mask: string>", # Or check tables matching regex pattern
"sql_condition": # Condition in raw SQL format for filtering the data to scan
"""
<raw_SQL_WHERE_condition: string>
"""
}
]
}
Rules
schemaorschema_mask— requiredYou must specify one of
schemaorschema_mask
tableortable_mask— optionalYou must specify one of
tableortable_mask
sql_condition— requiredMust specify SQL condition for
WHEREsectionKeyword
WHEREintosql_conditionis not required
Using this section
Example meta-dictionary
{
"data_sql_condition": [
{
"schema": "public",
"table": "salaries",
"sql_condition": """
WHERE hire_date >= '2024-01-01' and hire_date <= '2024-02-01'
"""
}
]
}
Result
For data scan of table public.salaries
will be used only data by January 2024.
I.3.8.1.7. Section: data_func #
Purpose
Using custom SQL functions to detect sensitive fields and apply the appropriate anonymization function.
Schema
{
"data_func": {
"<field_type: string>": [
{
"scan_func_per_field": "<scan_function_for_whole_field: string>",
"anon_func": "<anonymization_rule_template_for_field: string>",
},
{
"scan_func": "<scan_function_for_field_per_row: string>",
"anon_func": "<anonymization_rule_template_for_field: string>",
"n_count": "<how_many_checks_must_be_passed: integer>",
},
],
}
}
| Key | Meaning |
|---|---|
field_type | PostgreSQL type (or custom type). "anyelement" applies to all types. |
scan_func_per_field | Python function called once for whole field. Must return Boolean. |
scan_func | Python function called for each field value. Must return Boolean. |
anon_func | Template anonymization rule. Must contain %s placeholder for the field name. |
n_count | The field is considered sensitive if the scan function returned True at least n times for field values. Uses only for scan_func. (default: 1) |
I.3.8.1.8. Rule Combinations #
scan_funcorscan_func_per_field— requiredYou must specify one of
scan_funcorscan_func_per_fieldThis defines where the rule applies.
Note
Functions for scan_func_per_field must follow this template:
CREATE OR REPLACE FUNCTION <schema>.<function_name>( schema_name TEXT, table_name TEXT, field_name TEXT field_type TEXT, ) RETURNS boolean AS $$ BEGIN <function_logic>; END; $$ LANGUAGE plpgsql;
Functions for scan_func must follow this
template:
CREATE OR REPLACE FUNCTION <schema>.<function_name>( value TEXT, schema_name TEXT, table_name TEXT, field_name TEXT ) RETURNS boolean AS $$ BEGIN <function_logic>; END; $$ LANGUAGE plpgsql;
Using this section
Example meta-dictionary
{
"data_func": {
"anyelement": [
{
"scan_func_per_field": "custom_funcs.fulltext_search_by_organizations",
"anon_func": "anon_funcs.digest(\"%s\", 'salt', 'md5')",
},
{
"scan_func": "custom_funcs.is_employee_email",
"anon_func": "anon_funcs.digest(\"%s\", 'salt', 'md5')",
"n_count": 5
}
]
}
}
Field Type Resolution Priorities:
Exact type match (
varchar(20),numeric(10,2))Base type match (
varchar,numeric)anyelement
Example Tables Structure with following dictionary matches
| Schema | Table | Field | Is sensitive | Rule |
|---|---|---|---|---|
| public | employees | id | No | - |
| public | employees | full_name | No | - |
| public | employees | Yes | custom_funcs.is_employee_email | |
| public | employees | password | No | - |
| public | employees | phone | No | - |
| public | employees | hire_date | No | - |
| public | salaries | employee_id | No | - |
| public | salaries | monthly_salary | No | - |
| public | salaries | currency | No | - |
| ecommerce | orders | product_id | No | - |
| ecommerce | orders | client_name | No | - |
| ecommerce | orders | client_phone | No | - |
| ecommerce | orders | client_delivery_address | No | - |
| ecommerce | orders | count | No | - |
| ecommerce | orders | created | No | - |
| ecommerce | orders | status | No | - |
I.3.8.1.9. Section: data_const #
Purpose
Detect sensitive fields by matching full or partial constants.
Schema
{
"data_const": {
"constants": [
"<field_value_full: string>",
],
"partial_constants": [
"<field_value_partial: string>",
]
}
}
| Key | Meaning | |
|---|---|---|
constants
| No | Matching full field value. |
partial_constants
| No | Matching substring anywhere in the field value. |
Note
Must be specified one of constants or
partial_constants
Using this section
Example meta-dictionary
{
"data_const": {
"constants": [
"done",
],
"partial_constants": [
"@example.com",
]
}
}
Example Tables Structure with following dictionary matches
| Schema | Table | Field | Is sensitive | Rule |
|---|---|---|---|---|
| public | employees | id | No | - |
| public | employees | full_name | No | - |
| public | employees | Yes | @example.com | |
| public | employees | password | No | - |
| public | employees | phone | No | - |
| public | employees | hire_date | No | - |
| public | salaries | employee_id | No | - |
| public | salaries | monthly_salary | No | - |
| public | salaries | currency | No | - |
| ecommerce | orders | product_id | No | - |
| ecommerce | orders | client_name | No | - |
| ecommerce | orders | client_phone | No | - |
| ecommerce | orders | client_delivery_address | No | - |
| ecommerce | orders | count | No | - |
| ecommerce | orders | created | No | - |
| ecommerce | orders | status | Yes | done |
I.3.8.1.10. Section: data_regex #
Purpose
Detect sensitive data by scanning field values using regular expressions.
Schema
{
"data_regex": {
"rules": [
"<regex_rule: string>",
]
}
}
Using this section
Example meta-dictionary
{
"data_regex": {
"rules": [
"""[A-Za-z0-9]+([._-][A-Za-z0-9]+)*@[A-Za-z0-9-]+(\.[A-Za-z]{2,})+""", # email
"^(7?\d{10})$", # phone 7XXXXXXXXXX
]
}
}
Example Tables Structure with following dictionary matches
| Schema | Table | Field | Is sensitive | Rule |
|---|---|---|---|---|
| public | employees | id | No | - |
| public | employees | full_name | No | - |
| public | employees | Yes | [A-Za-z0-9]+([._-][A-Za-z0-9]+)*@[A-Za-z0-9-]+(.[A-Za-z]{2,})+ | |
| public | employees | password | No | - |
| public | employees | phone | Yes | ^(7?)$ |
| public | employees | hire_date | No | - |
| public | salaries | employee_id | No | - |
| public | salaries | monthly_salary | No | - |
| public | salaries | currency | No | - |
| ecommerce | orders | product_id | No | - |
| ecommerce | orders | client_name | No | - |
| ecommerce | orders | client_phone | Yes | ^(7?)$ |
| ecommerce | orders | client_delivery_address | No | - |
| ecommerce | orders | count | No | - |
| ecommerce | orders | created | No | - |
| ecommerce | orders | status | No | - |
I.3.8.1.11. Section: funcs #
Purpose
Configure anonymization functions per PostgreSQL type.
Schema
{
"funcs": {
"<field_type: string>": "<anonymization_function_for_field_type: string>",
"default": "<universal_anonymization_function_for_all_field_types: string>"
}
}
Using this section
Example meta-dictionary
{
"funcs": {
"text": "anon_funcs.digest(\"%s\", 'salt_word', 'md5')",
"numeric(10,2)": "anon_funcs.noise(\"%s\", 10)",
"numeric": "anon_funcs.noise(\"%s\", 30)",
"varchar(10)": "anon_funcs.random_string(10)",
"varchar(20)": "anon_funcs.random_string(20)",
"varchar": "anon_funcs.digest(\"%s\", 'varchar_salt_word', 'sha256')",
"timestamp": "anon_funcs.dnoise(\"%s\", interval '6 month')",
"default": "anon_funcs.digest(\"%s\", 'MySecretSaltWord', 'sha256')"
}
}
Field Type Resolution Priorities:
Exact type match (
varchar(20),numeric(10,2))Base type match (
varchar,numeric)Function from
defaultBuilt-in fallback function:
anon_funcs.digest("%s", 'salt_word', 'md5')
I.3.8.1.12. General meta-dict schema #
{
"skip_rules": [
{
"schema": "<schema_name: string>", # Skip this schema from scanning
"schema_mask": "<schema_regex_mask: string>", # Or skip from scanning schemas matching regex pattern
"table": "<table_name: string>", # Skip from scanning only this table
"table_mask": "<table_regex_mask: string>", # Or skip from scanning tables matching regex pattern
"fields": [ # Skip from scanning fields of tables
"<field_name: string>"
]
}
],
"include_rules": [
{
"schema": "<schema_name: string>", # Include only this schema
"schema_mask": "<schema_regex_mask: string>", # Or include schemas matching regex pattern
"table": "<table_name: string>", # Include only this table
"table_mask": "<table_regex_mask: string>", # Or include tables matching regex pattern
"fields": [ # Include only this fields of tables for scanning
"<field_name: string>"
]
}
],
"field": {
"rules": [
"<field_name_regex: string>",
],
"constants": [
"<field_name: string>",
]
},
"sens_pg_types": [
"<field_type: string>",
],
"data_sql_condition": [
{
"schema": "<schema_name: string>", # Check only this schema
"schema_mask": "<schema_regex_mask: string>", # Or check schemas matching regex pattern
"table": "<table_name: string>", # Check only this table
"table_mask": "<table_regex_mask: string>", # Or check tables matching regex pattern
"sql_condition": # Condition in raw SQL format for filtering the data to scan
"""
<raw_SQL_WHERE_condition: string>
"""
}
],
"data_func": {
"<field_type: string>": [
{
"scan_func": "<scan_function_for_field: string>",
"anon_func": "<anonymization_rule_template_for_field: string>",
"n_count": "<how_many_checks_must_be_passed: integer>",
},
],
},
"data_const": {
"constants": [
"<field_value_full: string>",
],
"partial_constants": [
"<field_value_partial: string>",
]
},
"data_regex": {
"rules": [
"<regex_rule: string>",
]
},
"funcs": {
"<field_type: string>": "<anonymization_function_for_field_type: string>",
"default": "<universal_anonymization_function_for_all_field_types: string>"
}
}
I.3.8.2. Sensitive Dictionary #
The sensitive dictionary defines explicit anonymization rules for fields. It is used in four operation modes, and its behavior differs slightly across them:
Fields listed in the dictionary are anonymized using the defined rules. All other fields are dumped as-is.
Fields listed in the sensitive dictionary are treated as known sensitive fields, which skips sensitivity detection for them. This speeds up scanning process.
Shows which anonymization rules would be applied to fields.
Shows how the rules would affect sample data, without performing a dump.
This dictionary can be created manually or generated automatically using create-dict (scan) mode.
Note
If a field appears both in the sensitive dictionary and the non-sensitive dictionary, the sensitive dictionary takes priority.
I.3.8.2.1. Schema #
{
"dictionary": [
{
"schema": "<schema_name: string>",
"table": "<table_name: string>",
"fields": {
"<field_name: string>": "<anonymization_rule_for_field: string>",
},
"sql_condition": # Optional. Condition in raw SQL format for filtering the data to dump. (This section ignored for create-dict (scan) mode
"""
<raw_SQL_WHERE_condition: string>
"""
}
],
# Optional section. It is used to exclude schemas and tables from the data dump.
"dictionary_exclude": [
{
"schema": "<schema_name: string>", # Exclude only this schema
"schema_mask": "<schema_regex_mask: string>", # Or exclude schemas matching regex pattern
"table": "<table_name: string>", # Exclude only this table
"table_mask": "<table_regex_mask: string>", # Or exclude tables matching regex pattern
}
]
}
Note
sql_condition in
dictionary section is optional. It can be
used for taking a part of data. Example: getting table data
only by last week.
dictionary_excludeis optional section. If a table appears in both the “dictionary_exclude” and “dictionary” sections, then table will be dumped. It can be used for particular dump and debugging of anonymization process.In
dictionary_exclude, you must use eitherschemaorschema_mask→ not both.In
dictionary_exclude, you must use eithertableortable_mask→ not both.
I.3.8.2.2. Using the Dictionary #
Example Database Structure
| Schema | Table | Field |
|---|---|---|
| public | employees | id |
| public | employees | full_name |
| public | employees | |
| public | employees | hire_date |
| public | salaries | employee_id |
| public | salaries | monthly_salary |
| public | salaries | currency |
| ecommerce | orders | product_id |
| ecommerce | orders | count |
| ecommerce | orders | client_name |
| ecommerce | orders | delivery_address |
| ecommerce | orders | created |
| ecommerce | orders | status |
| tenant_a | projects | title |
| tenant_a | projects | description |
| tenant_b | projects | title |
| tenant_b | projects | description |
| tenant_c | projects | title |
| tenant_c | projects | description |
Example Sensitive Dictionary
{
"dictionary": [
{
"schema": "public",
"table": "employees",
"fields": {
"full_name": "anon_funcs.digest(\"full_name\", 'salt_word', 'sha256')", # hashing employees names
"email": "md5(\"email\") || @abc.com", # hashing employee emails while preserving email format
},
},
{
"schema": "public",
"table": "salaries",
"fields": {
"monthly_salary": "10000", # just defines one value for the field for all rows
},
},
{
"schema": "ecommerce",
"table": "orders",
"fields": {
"client_name": "anon_funcs.digest(\"client_name\", 'salt_word', 'sha256')",
"delivery_address": "anon_funcs.digest(\"delivery_address\", 'salt_word', 'sha256')",
},
"sql_condition": # Dumping only the orders completed within the last week
"""
WHERE created > NOW() - '7 days'::interval
AND status = 'done'
"""
}
],
# Excluding all tables from schemas `tenant_a`, `tenant_b`, `tenant_c`
"dictionary_exclude": [
{
"schema_mask": "tenant_.*",
"table_mask": "*",
}
]
}
This dictionary matches the following table fields:
| Schema | Table | Field |
Used in dump mode
|
Used in create-dict (scan) mode
|
|---|---|---|---|---|
| public | employees | id | Dumped as is | Fields scanned using meta-dictionary rules |
| public | employees | full_name | Dumped with anonymization | Excluded from sensitivity checks as a “sensitive” field |
| public | employees | Dumped with anonymization | Excluded from sensitivity checks as a “sensitive” field | |
| public | employees | hire_date | Dumped as is | Fields scanned using meta-dictionary rules |
| public | salaries | employee_id | Dumped as is | Fields scanned using meta-dictionary rules |
| public | salaries | monthly_salary | Dumped with anonymization | Excluded from sensitivity checks as a “sensitive” field |
| public | salaries | currency | Dumped as is | Fields scanned using meta-dictionary rules |
| ecommerce | orders | product_id | Dumped as is | Fields scanned using meta-dictionary rules |
| ecommerce | orders | client_name | Dumped with anonymization | Excluded from sensitivity checks as a “sensitive” field |
| ecommerce | orders | delivery_address | Dumped with anonymization | Excluded from sensitivity checks as a “sensitive” field |
| ecommerce | orders | count | Dumped as is | Fields scanned using meta-dictionary rules |
| ecommerce | orders | created | Dumped as is | Fields scanned using meta-dictionary rules |
| ecommerce | orders | status | Dumped as is | Fields scanned using meta-dictionary rules |
I.3.8.3. Non-Sensitive Dictionary #
The non-sensitive dictionary is used only during the create-dict (scan) mode to speed up processing. It defines which fields should be treated as non-sensitive. Fields listed here are excluded from all sensitivity checks according to meta-dictionary rules.
This dictionary can be created manually or generated
automatically using
create-dict (scan)
mode with --output-no-sens-dict-file
option.
Note
If a field appears both in the sensitive dictionary and the non-sensitive dictionary, the sensitive dictionary takes priority.
I.3.8.3.1. Schema #
{
"no_sens_dictionary": [
{
"schema": "<schema_name: string>",
"table": "<table_name: string>",
"fields": [
"<field_name: string>",
]
},
]
}
I.3.8.3.2. Using the Dictionary #
Example Tables Structure
| Schema | Table | Field |
|---|---|---|
| public | employees | id |
| public | employees | full_name |
| public | employees | |
| public | employees | hire_date |
| public | salaries | employee_id |
| public | salaries | monthly_salary |
| public | salaries | currency |
Example Non-Sensitive Dictionary
{
"no_sens_dictionary": [
{
"schema": "public",
"table": "employees",
"fields": [
"id",
"hire_date",
]
},
{
"schema": "public",
"table": "salaries",
"fields": [
"employee_id",
"currency",
]
},
]
}
This dictionary matches the following table fields:
| Schema | Table | Field |
Used in create-dict (scan) mode
|
|---|---|---|---|
| public | employees | id | Excluded from sensitivity checks as a “no sensitive” field |
| public | employees | full_name | Fields scanned using meta-dictionary rules |
| public | employees | Fields scanned using meta-dictionary rules | |
| public | employees | hire_date | Excluded from sensitivity checks as a “no sensitive” field |
| public | salaries | employee_id | Excluded from sensitivity checks as a “no sensitive” field |
| public | salaries | monthly_salary | Fields scanned using meta-dictionary rules |
| public | salaries | currency | Excluded from sensitivity checks as a “no sensitive” field |
I.3.8.4. Tables dictionary #
The tables dictionary defines which tables participate in the partial dump and partial restore operations. It can act as either a whitelist (include-only) or a blacklist (exclude-only).
Use this dictionary when you need to:
dump or restore only specific tables
exclude unwanted tables from the dump or restore
I.3.8.4.1. Schema #
{
"tables": [
{
"schema": "<schema_name: string>", # Include only this schema
"schema_mask": "<schema_regex_mask: string>", # Or include schemas matching regex pattern
"table": "<table_name: string>", # Include only this table
"table_mask": "<table_regex_mask: string>", # Or include tables matching regex pattern
}
]
}
Note
You must use either
schemaorschema_mask→ not both.You must use either
tableortable_mask→ not both.
I.3.8.4.2. Using the Dictionary #
You can use the same dictionary in two different roles:
Whitelist — dump/restore only the matched tables
Blacklist — dump/restore all tables except the matched ones
Example Database Structure
| Schema | Table |
|---|---|
| public | employees |
| public | departments |
| public | positions |
| public | salaries |
| public | users |
| ecommerce | products |
| ecommerce | categories |
| ecommerce | orders |
| ecommerce | order_items |
| tenant_a | users |
| tenant_a | projects |
| tenant_a | tasks |
| tenant_a | comments |
| tenant_b | users |
| tenant_b | projects |
| tenant_b | tasks |
| tenant_b | comments |
| tenant_c | users |
| tenant_c | projects |
| tenant_c | tasks |
| tenant_c | comments |
Example Tables Dictionary
{
"tables": [
{
"schema": "public",
"table": "employees"
},
{
"schema": "ecommerce",
"table_mask": "^orders"
},
{
"schema_mask": "_a$",
"table": "projects"
},
{
"schema_mask": "*",
"table_mask": "users"
},
]
}
This dictionary matches the following tables:
| Schema | Table | Matched by rule |
|---|---|---|
| ecommerce | orders |
schema="ecommerce", table_mask="^orders"
|
| ecommerce | order_items |
schema="ecommerce", table_mask="^orders"
|
| tenant_a | projects |
schema_mask="_a$", table="projects"
|
| tenant_a | users |
schema_mask="*", table_mask="users"
|
| tenant_b | users |
schema_mask="*", table_mask="users"
|
| tenant_c | users |
schema_mask="*", table_mask="users"
|
| public | users |
schema_mask="*", table_mask="users"
|
| public | employees |
schema="public", table="employees"
|
I.3.9. Debug stages for anonymization process #
The debug stages allow you to test and troubleshoot the anonymization workflow without performing a full dump or restore, saving significant time and resources.
Each stage emulates a specific part of the anonymization pipeline:
Stage 1 — Validate Dict
Validates the sensitive dictionary and checks SQL logic without exporting any data.
Stage 2 — Validate Data
Performs anonymization checks on real data with a limited sample (LIMIT 100) using a prepared database schema.
Stage 3 — Validate Full
Executes the full anonymization logic with data sampling (LIMIT 100), but without requiring a prepared database.
These stages help you quickly debug rules, anonymization functions, SQL conditions, and dictionary configuration before running a full anonymized dump/restore process.
I.3.9.1. Stage 1: Validate dict #
This stage validate dictionary, show the tables and run SQL queries without data export into the disk or database. So if program works without errors => the stage is passed.
Figure I.3. dbg-stage-1.png

python pg_anon.py dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_source_db \
--output-dir=test_dbg_stages \
--prepared-sens-dict-file=test_dbg_stages.py \
--clear-output-dir \
--verbose=debug \
--debug \
--dbg-stage-1-validate-dict
I.3.9.2. Stage 2: Validate data #
Validate data, show the tables and run SQL queries with data export and limit 100 in prepared database. This stage requires database with all structure with only pre-data condition, which described in –prepared-sens-dict-file.
If you want to create the database with required structure, just run:
One-time structure dump:
python pg_anon.py sync-struct-dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_source_db \
--output-dir=test_stage_2 \
--prepared-sens-dict-file=test_dbg_stages.py \
--clear-output-dir \
--verbose=debug \
--debug \
--dbg-stage-3-validate-full
And then as many times as you want structure restore:
su - postgres -c "psql -U postgres -d postgres -c \"DROP DATABASE IF EXISTS test_target_db_7\""
su - postgres -c "psql -U postgres -d postgres -c \"CREATE DATABASE test_target_db_7\""
python pg_anon.py sync-struct-restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_target_db_7 \
--input-dir=test_stage_2 \
--verbose=debug \
--debug
Validate data stage in dump:
Figure I.4. dbg-stage-2.png

python pg_anon.py dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_source_db \
--output-dir=test_dbg_stages \
--prepared-sens-dict-file=test_dbg_stages.py \
--clear-output-dir \
--verbose=debug \
--debug \
--dbg-stage-2-validate-data
Validate data stage in data-restore:
python pg_anon.py sync-data-restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_target_db_7 \
--input-dir=test_dbg_stages \
--verbose=debug \
--debug
# And for example view all data in every table:
su - postgres -c "psql -U postgres -d test_target_db_7 -c \"SELECT * FROM public.contracts\""
I.3.9.3. Stage 3: Validate full #
Figure I.5. dbg-stage-3.png

Makes all logic with “limit 100” in SQL queries. In this stage you don’t need prepared database, just run:
su - postgres -c "psql -U postgres -d postgres -c \"DROP DATABASE IF EXISTS test_target_db_8\"" su - postgres -c "psql -U postgres -d postgres -c \"CREATE DATABASE test_target_db_8\""
Validate full stage in dump:
python pg_anon.py dump \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_source_db \
--output-dir=test_dbg_stages \
--prepared-sens-dict-file=test_dbg_stages.py \
--clear-output-dir \
--verbose=debug \
--debug \
--dbg-stage-3-validate-full
Validate full stage in restore:
python pg_anon.py restore \
--db-host=127.0.0.1 \
--db-user=postgres \
--db-user-password=postgres \
--db-name=test_target_db_8 \
--input-dir=test_dbg_stages \
--verbose=debug \
--debug
# And for example view all data in every table:
su - postgres -c "psql -U postgres -d test_target_db_8 -c \"SELECT * FROM public.contracts\""
I.3.10. FAQ #
I.3.10.1. Where can I find operation logs and launch parameters? #
All run data is stored in the /path_to_pg_anon/runs directory.
Inside, the structure is: <year>/<month>/<day>/<operation_id>.
Each operation folder contains:
a
logsdirectory with all log filesa
run_options.jsonfile with all parameters used to runpg_anon
If the --save-dicts option was used, the folders input and output will also appear.
They contain all input and output dictionaries for that run.
I.3.10.2. Can I restore a pg_anon dump using pg_dump? #
No. The pg_anon dump format is not compatible with pg_dump due to the specifics of anonymization.
For the same reason, a regular backup created with pg_dump cannot be restored using pg_anon.
I.3.10.3. Does pg_anon modify the structure or data of the source database during scan, dump, view-data, or view-fields? #
pg_anon does not modify either the structure or the data of the source database.
The only thing pg_anon adds is the anon_funcs
schema, which is required for its internal operations.
I.3.10.4. Can I use custom functions for scanning? #
Yes. The meta-dictionary has a data_func section. In this section, you can use any custom SQL function for sensitivity validation.
This allows you to implement checks using full-text search or any other SQL capabilities.
Such functions must follow this template:
CREATE OR REPLACE FUNCTION <schema>.<function_name>( value TEXT, schema_name TEXT, table_name TEXT, field_name TEXT ) RETURNS boolean AS $$ BEGIN <function_logic>; END; $$ LANGUAGE plpgsql;
I.3.10.5. Can I use custom functions for anonymization? #
Yes. You can use any functions and values available in the source database.
You must ensure that anonymized values match the field format.
For example, if the field type is varchar(15), you must manually ensure the generated value does not exceed 15 characters.
If the format is violated, the dump may be created successfully, but restoring it may fail.
Also for this cases can be used data_func section with scan_func for field length comparison and specific anon_function for specific length.
For example, scan function bellow getting only fields with length less than 20 symbols and containing emails:
CREATE OR REPLACE FUNCTION my_scan_funcs.is_email_field_with_len_20_chars(
value TEXT,
schema_name TEXT,
table_name TEXT,
field_name TEXT
)
RETURNS boolean AS $$
DECLARE
max_len integer;
is_email boolean;
BEGIN
SELECT c.character_maximum_length
INTO max_len
FROM information_schema.columns c
WHERE c.table_schema = $2
AND c.table_name = $3
AND c.column_name = $4;
-- field length must be 20 characters
if max_len != 20 then
return false;
end if;
-- value must be not null for comparison
if $1 is null then
return false;
end if;
-- check email format by regexp
return $1 ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$';
END;
$$ LANGUAGE plpgsql;
The meta-dict rule below can be used to detect email fields with a length of 20 characters and anonymize them while preserving both format and length.
{
"data_func": {
"varchar": [
{
"scan_func": "my_scan_funcs.is_email_field_with_len_20_chars",
"anon_func": "lower(anon_funcs.random_string(9)) || '@secret.com'",
"n_count": 10
}
]
}
}
I.3.10.6. Is the scanning stage required? #
No. You can create all required dictionaries manually or reuse previously generated dictionaries.
I.3.10.7. Why load sensitive and non-sensitive dictionaries during scanning? #
They are used only to speed up scanning.
These dictionaries act as a cache, allowing pg_anon to immediately know which fields are sensitive and which are not.
This way, repeated scans of the same database will run very quickly.
If new fields appear that are not present in the dictionaries, pg_anon will evaluate them using the rules from the meta-dictionary.
I.3.10.8. When should I use --config with a
configuration file? #
If you plan to use pg_anon with different Tantor SE major versions, you should define a config file.
It is much easier to configure this once rather than repeatedly passing paths to pg_dump and pg_restore.
If you always use a single Tantor SE version, the system pg_dump and pg_restore will be used, and a config file is unnecessary.
I.3.10.9. Can I split one large dictionary into multiple smaller ones? #
Yes. All dictionary-related parameters accept lists of files.
At startup, pg_anon merges them into a single dictionary internally.
This makes it easy to separate different groups of rules into different files and combine them as needed. This is especially helpful for the meta-dictionary, which contains many optional sections.
I.3.10.10. Restore error: “Database is not empty” #
Restore mode checks that the target database is empty.
This is done to prevent accidental data loss in the target database.
If needed, use the --drop-db or
--clean-db options during restore.
I.3.10.11. Restore error: “Database is being accessed by other users” #
When using the --drop-db option, the target
database will be recreated using
DROP DATABASE and
CREATE DATABASE.
If there are active connections, the
DROP DATABASE command cannot be executed.
You must terminate all active sessions and run the restore operation again.
I.3.10.12. Difference between options --drop-db
and --clean-db for restore mode #
--drop-db— recreate target database using commandsDROP DATABASEandCREATE DATABASE. After that running restore process on empty db.--clean-db— performs a restore similar to pg_restore –clean –if-exists. It creates missing tables from the backup in the target database. It also preserves extra tables that exist in the target DB and are not contained in the restoring backup. This option does not require an empty target database.
I.3.10.13. Determining Optimal Process and Connection Counts #
To configure optimal values, first identify these system parameters:
max_connections — maximum connections allowed by your Tantor SE database
CPU core count
Reserved connections (typically 3-10 for maintenance/admin connections)
Important Considerations:
Exceeding max_connections may cause pg_anon failures and affect other database applications
Ensure sufficient connection headroom for other services
I.3.10.13.1. Recommended Configuration #
Process Count
--processes = CPU cores
Database Connections per Process
--db-connections-per-process ≤ (max_connections - reserved_connections) / --processes
I.3.10.13.2. Example Calculation #
CPU cores: 4
max_connections: 100
reserved_connections: 5
–processes: 4
–db-connections-per-process: (100 - 5) / 4 ≈ 23.75 → 23
Verification: 4 processes × 23 connections = 92 total connections (within 100 limit)
I.3.11. SQL Functions Library #
All functions are contained in the init.sql
file. After run pg_anon in init mode, they will
reside in the anon_funcs schema in the source
database. If you want to write a new function, simply create it in
the anon_funcs schema in your source database.
List of some functions available for use in dictionaries:
I.3.11.1. Functions list #
I.3.11.1.2. dnoise #
Add noise to a date or timestamp:
SELECT anon_funcs.dnoise('2020-02-02 10:10:10'::timestamp, interval '1 month');
>> 2020-03-02 10:10:10
I.3.11.1.3. digest #
Hash a string value with a specified hash function:
SELECT anon_funcs.digest('text', 'salt', 'sha256');
>> '3353e....'
I.3.11.1.4. partial #
Keep the first few characters (2nd argument) and the last few characters (4th argument) of the specified string, adding a constant (3rd argument) in between:
SELECT anon_funcs.partial('123456789', 1, '***', 3);
>> 1***789
I.3.11.1.5. partial_email #
Mask an email address:
SELECT anon_funcs.partial_email('example@gmail.com');
>> ex*****@gm*****.com
I.3.11.1.6. random_string #
Generate a random string of specified length:
SELECT anon_funcs.random_string(7); >> H3ZVL5P
I.3.11.1.8. random_date_between #
Generate a random date and time within a specified range:
SELECT anon_funcs.random_date_between( '2020-02-02 10:10:10'::timestamp, '2022-02-05 10:10:10'::timestamp ); >> 2021-11-08 06:47:48.057
I.3.11.1.9. random_date #
Generate a random date and time:
SELECT anon_funcs.random_date(); >> 1911-04-18 21:54:13.139
I.3.11.1.10. random_int_between #
Generate a random integer within a specified range:
SELECT anon_funcs.random_int_between(100, 200); >> 159
I.3.11.1.11. random_bigint_between #
Generate a random bigint within a specified range:
SELECT anon_funcs.random_bigint_between(6000000000, 7000000000); >> 6268278565
I.3.11.1.12. random_phone #
Generate a random phone number:
SELECT anon_funcs.random_phone('+7');
>> +7297479867
I.3.11.1.13. random_hash #
Generate a random hash using the specified function:
SELECT anon_funcs.random_hash('seed', 'sha512');
>> b972f895ebea9cf2f65e19abc151b8031926c4a332471dc5c40fab608950870d6dbddcd18c7e467563f9b527e63d4d13870e4961c0ff2a62f021827654ae51fd
I.3.11.1.14. random_in #
Select a random element from an array:
SELECT anon_funcs.random_in(array['a', 'b', 'c']); >> a
I.3.11.1.15. hex_to_int #
Convert a hexadecimal value to decimal:
SELECT anon_funcs.hex_to_int('8AB');
>> 2219
I.3.11.2. pgcrypto #
In addition to the existing functions in the anon_funcs schema, functions from the pgcrypto extension can also be used.
CREATE EXTENSION IF NOT EXISTS pgcrypto;
Example of using encryption with base64 encoding to store the encrypted value in a text field:
SELECT encode((SELECT encrypt('data', 'password', 'bf')), 'base64');
>> cSMq9gb1vOw=
SELECT decrypt(
(
SELECT decode('cSMq9gb1vOw=', 'base64')
), 'password', 'bf');
>> data
I.3.11.3. How to add your own functions #
Also, adding new anonymization functions can be performed by
adding init.sql to the file and then run
pg_anon in init mode.
I.3.12. API #
| Integration Endpoints |
|---|
| Operations list |
| Operation details |
| Delete operation data |
| Operation logs |
I.3.12.1. Operation Endpoints #
I.3.12.1.1. Check DB connection #
POST /api/stateless/check_db_connection
Description
Checks whether pg_anon can connect to the specified database.
If the connection is successful, the endpoint returns status
code 200.
Check DB connection request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| host | string | Yes | Database host. |
| port | integer | Yes | Database port. |
| db_name | string | Yes | Database name. |
| user_login | string | Yes | Database username. |
| user_password | string | Yes | Database user password. |
Example
curl -X POST http://127.0.0.1:8000/api/stateless/check_db_connection \
-H "Content-Type: application/json" \
-d '{
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
}'
Responses
| Status Code | Description | Component |
|---|---|---|
200
| Database is reachable | - |
400
| Bad Request | ErrorResponse |
500
| Internal Server Error | ErrorResponse |
422
| Validation Error | HTTPValidationError |
I.3.12.1.2. Run create-dict (scan) operation #
POST /api/stateless/scan
Description
Runs pg_anon in create-dict (scan) mode in the background.
Operation lifecycle:
The client calls this endpoint.
The API returns one of the following status codes:
200— the scan operation has been successfully started.400or422— the request is invalid; the operation is not started.
The service sends a webhook request with status
in_progressto thewebhook_status_url. The payload format is described in the scan webhook request schemaThe operation executes in the background.
If an error occurs during processing, the service sends a webhook request with status
error.If the operation completes successfully, the service sends a webhook request with status
success.
Scan request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| operation_id | string | Yes | External operation ID. Accepts any string. Returned unchanged in webhook requests. |
| db_connection_params | db connection params | Yes | Source database connection credentials. |
| webhook_status_url | string | Yes | Callback URL that receives POST requests in the format described in the scan webhook request schema. |
| webhook_metadata | Any | No | Arbitrary metadata payload. Sent unchanged in webhook requests. |
| webhook_extra_headers | JSON | No |
Additional HTTP headers added to webhook requests.
Useful for integration, e.g., to include an
Authorization header.
|
| webhook_verify_ssl | boolean | No | Enables or disables SSL certificate verification for webhook requests. |
| save_dicts | boolean | No |
Saves all input and output dictionaries into the
runs directory. Useful for
debugging or integration. Default:
false.
|
| type | string | No |
Defines the scan mode: full or
partial. Default:
partial.
|
| depth | integer | No |
Maximum number of table rows used for partial scan.
Applies only when type = partial.
Default: 10000.
|
| meta_dict_contents | array of dictionary content | Yes | Contents of the meta dictionary, defining rules for scanning fields. |
| sens_dict_contents | array of dictionary content | No | Contents of the sensitive dictionary. Used to improve scan performance. |
| no_sens_dict_contents | array of dictionary content | No | Contents of the non-sensitive dictionary. Used to improve scan performance. |
| need_no_sens_dict | boolean | No |
If true, generates a
non-sensitive
dictionary and returns it in the
no_sens_dict_contents field of
the webhook payload.
|
| proc_count | integer | No |
Number of processes used for multiprocessing.
Default: 4.
|
| proc_conn_count | integer | No |
Number of database connections allocated per process
for I/O operations. Default: 4.
|
Example
curl -X POST http://127.0.0.1:8000/api/stateless/scan \
-H "Content-Type: application/json" \
-d '{
"operation_id": "my-uniq-scan-id-0001",
"db_connection_params": {
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
},
"webhook_status_url": "https://my-service/pg-anon-result-processor",
"webhook_metadata": {"extra_field_1": "extra_data", "extra_field_2": {"fld1": [1,2,3], "fld2": 123}},
"webhook_extra_headers": {"Authorization": "Api-key my_super_secret_api_key", "X-my-service-extra-header": "header value"},
"webhook_verify_ssl": true,
"save_dicts": true,
"type": "partial",
"depth": 10000,
"meta_dict_contents": [{
"name": "my simple meta dict with email scan rule",
"content": "{\"data_regex\": {\"rules\": [\".*@.*\"]}, \"funcs\": {\"text\": \"md5(%s)\"}}"
}],
"sens_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"no_sens_dict_contents": [{
"name": "non-sens dict example",
"content": "{\"no_sens_dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": [\"id\", \"created\"]}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": [\"id\", \"registered\"]}]}"
}],
"need_no_sens_dict": true,
"proc_count": 4,
"proc_conn_count": 4
}'
Scan webhook request schema
| Field | Type | Required | Description |
|---|---|---|---|
| operation_id | string | Yes |
External operation ID. Copied from the
operation_id of the original scan
request.
|
| internal_operation_id | string | No |
Internal pg_anon operation ID generated
automatically. Present only when
status is
success or
error. Can be used to correlate
the webhook event with operation data and logs
stored in the /runs directory.
Also used in the Operation
endpoints.
|
| status_id | integer | Yes |
Numeric status code. Possible values:
2 — success, 3
— error, 4 — in_progress.
|
| status | string | Yes |
Human-readable status. Possible values:
success,
error,
in_progress.
|
| webhook_metadata | Any | No |
Metadata payload passed “as is”. Copied from the
webhook_metadata field of the
scan request.
|
| started | string | No |
Operation start timestamp in ISO8601 format (UTC+0).
Present only for statuses success
or error.
|
| ended | string | No |
Operation end timestamp in ISO8601 format (UTC+0).
Present only for statuses success
or error.
|
| error | string | No |
Error message. Present only when
status = error.
|
| error_code | string | No | Machine-readable error code for i18n mapping. Present only when status = error. |
| run_options | JSON | No |
Snapshot of the operation’s runtime options. Useful
for analysis, debugging, and rerunning the
operation. Present only when
status = success or
error.
|
| sens_dict_content | dictionary content | No | Resulting Sensitive dictionary. Returned only when the scan completes successfully. Used for dump operations or repeated scans. |
| no_sens_dict_content | dictionary content | No |
Resulting
Non-Sensitive
dictionary. Present only when the scan
completes successfully
and the original
request specified
need_no_sens_dict = true.
|
Responses
| Status Code | Description | Component |
|---|---|---|
201
| Operation successfully started | - |
400
| Bad Request | ErrorResponse |
500
| Internal Server Error | ErrorResponse |
422
| Validation Error | HTTPValidationError |
I.3.12.1.3. Display database fields with anonymization rules #
POST /api/stateless/view-fields
Description
Runs pg_anon in view-fields mode and returns the result in the response.
View-fields request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| db_connection_params | db connection params | Yes | Source database credentials. |
| sens_dict_contents | array of dictionary content | No | Sensitive dictionary content that defines rules for sensitive fields. |
| schema_name | string | No | Filter by schema name. |
| schema_mask | string | No | Filter by schema name using a regular expression. |
| table_name | string | No | Filter by table name. |
| table_mask | string | No | Filter by table name using a regular expression. |
| view_only_sensitive_fields | boolean | No |
Displays only sensitive fields (default:
all fields).
|
| fields_limit_count | integer | No |
Maximum number of fields to include for output
(default: 5000).
|
Example
curl -X POST http://127.0.0.1:8000/api/stateless/view-fields \
-H "Content-Type: application/json" \
-d '{
"db_connection_params": {
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
},
"sens_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"schema_name": "public",
"table_mask": "^client",
"view_only_sensitive_fields": true,
"fields_limit_count": 1000
}'
Responses
| Status Code | Description | Component |
|---|---|---|
| 200 | Successful Response | ViewFieldsResponse |
| 400 | Bad Request | ErrorResponse |
| 500 | Internal Server Error | ErrorResponse |
| 422 | Validation Error | HTTPValidationError |
I.3.12.1.4. Display table with anonymization data #
POST /api/stateless/view-data
Description
Displays table data in original and anonymized variants for comparison. Runs pg_anon in view-data mode and returns the result in the response.
View-data request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| db_connection_params | db connection params | Yes | Source database credentials. |
| sens_dict_contents | array of dictionary content | No | Sensitive dictionary content defining rules for sensitive fields. |
| schema_name | string | Yes | Schema name. |
| table_name | string | Yes | Table name. |
| limit | integer | No |
Number of rows to display (default:
100).
|
| offset | integer | No |
Row offset for pagination (default:
0).
|
Example
curl -X POST http://127.0.0.1:8000/api/stateless/view-data \
-H "Content-Type: application/json" \
-d '{
"db_connection_params": {
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
},
"sens_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"schema_name": "public",
"table_name": "clients",
"limit": 10,
"offset": 20
}'
Responses
| Status Code | Description | Component |
|---|---|---|
| 200 | Successful Response | ViewDataResponse |
| 400 | Bad Request | ErrorResponse |
| 500 | Internal Server Error | ErrorResponse |
| 422 | Validation Error | HTTPValidationError |
I.3.12.1.5. Display database schemas only for preview #
POST /api/stateless/preview
Description
Returns a list of database schemas. Can be filtered by schema name substring.
Preview schemas request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| db_connection_params | db connection params | Yes | Source database credentials. |
| schema_filter | string | No | Filter schemas by name substring (case-sensitive). |
Example
curl -X POST http://127.0.0.1:8000/api/stateless/preview \
-H "Content-Type: application/json" \
-d '{
"db_connection_params": {
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
},
"schema_filter": "public"
}'
Responses
| Status Code | Description | Component |
|---|---|---|
| 200 | Successful Response | PreviewSchemasResponse |
| 400 | Bad Request | ErrorResponse |
| 500 | Internal Server Error | ErrorResponse |
| 422 | Validation Error | HTTPValidationError |
I.3.12.1.6. Display database tables with fields for preview #
POST /api/stateless/preview/{schema}
Description
Returns a list of tables in the specified schema with their fields and sensitivity/exclusion status based on the provided sensitive dictionary.
Preview schema tables request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| db_connection_params | db connection params | Yes | Source database credentials. |
| sens_dict_contents | array of dictionary content | Yes | Sensitive dictionary content that defines rules for sensitive fields. |
| limit | integer | No | Maximum number of tables to return (default: 20). |
| offset | integer | No | Number of tables to skip for pagination (default: 0). |
| table_filter | string | No | Filter tables by name substring (case-sensitive). |
| view_only_sensitive_tables | boolean | No | If true, returns only tables that match the sensitive dictionary (default: false). |
Path parameters
| Field | Type | Required | Description |
|---|---|---|---|
| schema | string | Yes | Schema name |
Example
curl -X POST http://127.0.0.1:8000/api/stateless/preview/public \
-H "Content-Type: application/json" \
-d '{
"db_connection_params": {
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
},
"sens_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"limit": 20,
"offset": 0,
"table_filter": "user",
"view_only_sensitive_tables": false
}'
Responses
| Status Code | Description | Component |
|---|---|---|
| 200 | Successful Response | PreviewSchemaTablesResponse |
| 400 | Bad Request | ErrorResponse |
| 500 | Internal Server Error | ErrorResponse |
| 422 | Validation Error | HTTPValidationError |
I.3.12.1.7. Run dump operation #
POST /api/stateless/dump
Description
Runs pg_anon in dump mode in the background.
Operation lifecycle:
The client calls this endpoint.
The API returns one of the following status codes:
200— the dump operation has been successfully started.400or422— the request is invalid; the operation is not started.
The service sends a webhook request with status
in_progressto thewebhook_status_url. The payload format is described in the dump webhook request schema.The operation executes in the background.
If an error occurs during processing, the service sends a webhook request with status
error.If the operation completes successfully, the service sends a webhook request with status
success.
Dump request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| operation_id | string | Yes | External operation ID. Accepts any string. Returned unchanged in webhook requests. |
| db_connection_params | db connection params | Yes | Source database connection credentials. |
| webhook_status_url | string | Yes | Callback URL that receives POST requests in the format described in the dump webhook request schema. |
| webhook_metadata | Any | No | Arbitrary metadata payload. Sent unchanged in webhook requests. |
| webhook_extra_headers | JSON | No |
Additional HTTP headers added to webhook requests.
Useful for integration, e.g., to include an
Authorization header.
|
| webhook_verify_ssl | boolean | No | Enables or disables SSL certificate verification for webhook requests. |
| save_dicts | boolean | No |
Saves all input and output dictionaries into the
runs directory. Useful for
debugging or integration. Default:
false.
|
| type | string | No |
Defines the dump type. Options:
dump,
sync-struct-dump,
sync-data-dump.
Default:
dump.
|
| sens_dict_contents | array of dictionary content | Yes | Contents of the sensitive dictionary, defining rules for data anonymization during the dump. |
| partial_tables_dict_contents | array of dictionary content | No | Contents of the tables dictionary specifying tables to include in a partial dump. |
| partial_tables_exclude_dict_contents | array of dictionary content | No | Contents of the tables dictionary specifying tables to exclude from a partial dump. |
| output_path | string | No |
Path where the dump will be created under
/path_to_pg_anon/output. For
example, "my_dump" will
be located at
/path_to_pg_anon/output/my_dump.
|
| pg_dump_path | string | No |
Path to the pg_dump Postgres
tool. Default: /usr/bin/pg_dump.
|
| pg_dump_options | string | No | Additional options passed directly to pg_dump utility. Example: "--no-comments --encoding=LATIN1". |
| ignore_privileges | boolean | No | Ignore privileges from source db. |
| proc_count | integer | No |
Number of processes used for multiprocessing.
Default: 4.
|
| proc_conn_count | integer | No |
Number of database connections allocated per process
for I/O operations. Default: 4.
|
Example
curl -X POST http://127.0.0.1:8000/api/stateless/dump \
-H "Content-Type: application/json" \
-d '{
"operation_id": "my-uniq-scan-id-0001",
"db_connection_params": {
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
},
"webhook_status_url": "https://my-service/pg-anon-result-processor",
"webhook_metadata": {"extra_field_1": "extra_data", "extra_field_2": {"fld1": [1,2,3], "fld2": 123}},
"webhook_extra_headers": {"Authorization": "Api-key my_super_secret_api_key", "X-my-service-extra-header": "header value"},
"webhook_verify_ssl": true,
"save_dicts": true,
"type": "dump",
"sens_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"partial_tables_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"partial_tables_exclude_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"output_path": "my_dump",
"pg_dump_path": "/usr/lib/postgresql/17/bin/pg_dump",
"ignore_privileges": false,
"proc_count": 4,
"proc_conn_count": 4
}'
Dump webhook request schema
| Field | Type | Required | Description |
|---|---|---|---|
| operation_id | string | Yes |
External operation ID. Copied from the
operation_id of the original scan
request.
|
| internal_operation_id | string | No |
Internal pg_anon operation ID generated
automatically. Present only when
status is
success or
error. Can be used to correlate
the webhook event with operation data and logs
stored in the /runs directory.
Also used in the Operation
endpoints.
|
| status_id | integer | Yes |
Numeric status code. Possible values:
2 — success, 3
— error, 4 — in_progress.
|
| status | string | Yes |
Human-readable status. Possible values:
success,
error,
in_progress.
|
| webhook_metadata | Any | No |
Metadata payload passed “as is”. Copied from the
webhook_metadata field of the
scan request.
|
| started | string | No |
Operation start timestamp in ISO8601 format (UTC+0).
Present only for statuses success
or error.
|
| ended | string | No |
Operation end timestamp in ISO8601 format (UTC+0).
Present only for statuses success
or error.
|
| error | string | No |
Error message. Present only when
status = error.
|
| error_code | string | No | Machine-readable error code for i18n mapping. Present only when status = error. |
| run_options | JSON | No |
Snapshot of the operation’s runtime options. Useful
for analysis, debugging, and rerunning the
operation. Present only when
status = success or
error.
|
| size | integer | No | Size of the dump in bytes. |
Responses
| Status Code | Description | Component |
|---|---|---|
201
| Operation successfully started | - |
400
| Bad Request | ErrorResponse |
500
| Internal Server Error | ErrorResponse |
422
| Validation Error | HTTPValidationError |
I.3.12.1.8. Run restore operation #
POST /api/stateless/restore
Description
Runs pg_anon in restore mode in the background.
Operation lifecycle:
The client calls this endpoint.
The API returns one of the following status codes:
200— the restore operation has been successfully started.400or422— the request is invalid; the operation is not started.
The service sends a webhook request with status
in_progressto thewebhook_status_url. The payload format is described in the restore webhook request schema.The operation executes in the background.
If an error occurs during processing, the service sends a webhook request with status
error.If the operation completes successfully, the service sends a webhook request with status
success.
Restore request body schema
| Field | Type | Required | Description |
|---|---|---|---|
| operation_id | string | Yes | External operation ID. Accepts any string. Returned unchanged in webhook requests. |
| db_connection_params | db connection params | Yes | Source database connection credentials. |
| webhook_status_url | string | Yes | Callback URL that receives POST requests in the format described in the restore webhook request schema. |
| webhook_metadata | Any | No | Arbitrary metadata payload. Sent unchanged in webhook requests. |
| webhook_extra_headers | JSON | No |
Additional HTTP headers added to webhook requests.
Useful for integration, e.g., to include an
Authorization header.
|
| webhook_verify_ssl | boolean | No | Enables or disables SSL certificate verification for webhook requests. |
| save_dicts | boolean | No |
Saves all input and output dictionaries into the
runs directory. Useful for
debugging or integration. Default:
false.
|
| type | string | No |
Defines the restore type. Options:
restore,
sync-struct-restore,
sync-data-restore.
Default: restore.
|
| input_path | string | Yes |
Path to the dump to restore, relative to
/path_to_pg_anon/output. Example:
"my_dump" will restore
from
/path_to_pg_anon/output/my_dump.
|
| partial_tables_dict_contents | array of dictionary content | No | Contents of the tables dictionary specifying tables to include in a partial restore |
| partial_tables_exclude_dict_contents | array of dictionary content | No | Contents of the tables dictionary specifying tables to exclude from a partial restore |
| pg_restore_path | string | No |
Path to the pg_restore Postgres
tool. Default:
/usr/bin/pg_restore.
|
| pg_restore_options | string | No | Additional options passed directly to pg_restore utility. Example: "--no-comments --no-table-access-method". |
| drop_custom_check_constr | boolean | No |
Drops all CHECK constraints that contain
user-defined procedures to avoid performance
degradation during data loading. Default:
false.
|
| clean_db | boolean | No |
Cleans existing database objects before restoring.
Mutually exclusive with drop_db.
|
| drop_db | boolean | No |
Drops the target database before restoring. Mutually
exclusive with clean_db.
|
| ignore_privileges | boolean | No | Ignore privileges from source db. |
| proc_count | integer | No |
Number of processes used for multiprocessing.
Default: 4.
|
| proc_conn_count | integer | No |
Number of database connections allocated per process
for I/O operations. Default: 4.
|
Example
curl -X POST http://127.0.0.1:8000/api/stateless/restore \
-H "Content-Type: application/json" \
-d '{
"operation_id": "my-uniq-scan-id-0001",
"db_connection_params": {
"host": "localhost",
"port": "5432",
"db_name": "source_db",
"user_login": "postgres",
"user_password": "postgres"
},
"webhook_status_url": "https://my-service/pg-anon-result-processor",
"webhook_metadata": {"extra_field_1": "extra_data", "extra_field_2": {"fld1": [1,2,3], "fld2": 123}},
"webhook_extra_headers": {"Authorization": "Api-key my_super_secret_api_key", "X-my-service-extra-header": "header value"},
"webhook_verify_ssl": true,
"save_dicts": true,
"type": "restore",
"input_path": "my_dump",
"partial_tables_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"partial_tables_exclude_dict_contents": [{
"name": "sens dict for email anonymization",
"content": "{\"dictionary\": [{\"schema\": \"public\", \"table\": \"users\", \"fields\": {\"email\": \"md5(email)\"}}, {\"schema\": \"public\", \"table\": \"clients\", \"fields\": {\"email\": \"md5(email)\"}}]}"
}],
"pg_restore_path": "/usr/lib/postgresql/17/bin/pg_restore",
"drop_custom_check_constr": false,
"clean_db": false,
"drop_db": false,
"ignore_privileges": false,
"proc_count": 4,
"proc_conn_count": 4
}'
Restore webhook request schema
| Field | Type | Required | Description |
|---|---|---|---|
| operation_id | string | Yes |
External operation ID. Copied from the
operation_id of the original scan
request.
|
| internal_operation_id | string | No |
Internal pg_anon operation ID generated
automatically. Present only when
status is
success or
error. Can be used to correlate
the webhook event with operation data and logs
stored in the /runs directory.
Also used in the Operation
endpoints.
|
| status_id | integer | Yes |
Numeric status code. Possible values:
2 — success, 3
— error, 4 — in_progress.
|
| status | string | Yes |
Human-readable status. Possible values:
success,
error,
in_progress.
|
| webhook_metadata | Any | No |
Metadata payload passed “as is”. Copied from the
webhook_metadata field of the
scan request.
|
| started | string | No |
Operation start timestamp in ISO8601 format (UTC+0).
Present only for statuses success
or error.
|
| ended | string | No |
Operation end timestamp in ISO8601 format (UTC+0).
Present only for statuses success
or error.
|
| error | string | No |
Error message. Present only when
status = error.
|
| error_code | string | No | Machine-readable error code for i18n mapping. Present only when status = error. |
| run_options | JSON | No |
Snapshot of the operation’s runtime options. Useful
for analysis, debugging, and rerunning the
operation. Present only when
status = success or
error.
|
Responses
| Status Code | Description | Component |
|---|---|---|
201
| Operation successfully started | - |
400
| Bad Request | ErrorResponse |
500
| Internal Server Error | ErrorResponse |
422
| Validation Error | HTTPValidationError |
I.3.12.2. Integration Endpoints #
I.3.12.2.1. Operations list #
GET /operation
Description
Returns a list of background operation directories (scan,
dump, restore). Only operations executed with
save_dicts enabled are included. Useful
for integration purposes.
Operations list request params
| Field | Type | Required | Description |
|---|---|---|---|
| date_before | date | No | Filter: operations before this date. Date format ISO 8601 |
| date_after | date | No | Filter: operations after this date. Date format ISO 8601 |
Example
curl -X GET http://127.0.0.1:8000/operation?date_after=2025-01-01&date_before=2025-12-31
Responses
| Status Code | Description | Component |
|---|---|---|
200
| List of operation paths | array of strings |
422
| Validation Error | HTTPValidationError |
I.3.12.2.2. Operation details #
GET /operation/{internal_operation_id}
Description
Returns detailed information about a background operation
(scan, dump, restore). Only operations executed with
save_dicts enabled are included. Useful for
integration purposes.
Operations details request params
| Field | Type | Required | Description |
|---|---|---|---|
| internal_operation_id | string | Yes | Internal pg_anon operation ID. |
Example
curl -X GET http://127.0.0.1:8000/operation/c6c98133-856f-46b3-ba9e-3a0092b8d9aa
Responses
| Status Code | Description | Component |
|---|---|---|
200
| Operation details | OperationDataResponse |
404
| Operation directory not found | HTTPValidationError |
422
| Validation Error | HTTPValidationError |
I.3.12.2.3. Delete operation data #
DELETE /operation/{internal_operation_id}
Description
Deletes the operation data directory in /runs.
Also removes the dump directory from the output
path if the operation type is dump.
Example
curl -X DELETE http://127.0.0.1:8000/operation/c6c98133-856f-46b3-ba9e-3a0092b8d9aa
Delete operation data request params
| Field | Type | Required | Description |
|---|---|---|---|
| internal_operation_id | string | Yes | Internal pg_anon operation ID. |
Responses
| Status Code | Description | Component |
|---|---|---|
204
| Operation data successfully deleted | - |
400
| Bad Request | ErrorResponse |
422
| Validation Error | HTTPValidationError |
500
| Internal Server Error | ErrorResponse |
I.3.12.2.4. Operation logs #
GET /operation/{internal_operation_id}/logs
Description
Returns log output for a background operation (scan, dump,
restore). Only operations executed with
save_dicts enabled are included. Useful
for integration purposes.
Operations logs request body schema
| Name | Type | Required | Description |
|---|---|---|---|
| internal_operation_id | string | Yes | Internal pg_anon operation ID. |
| tail_lines | integer | No |
Number of log lines to read from the end of the
file. Default: 1000
|
Example
curl -X GET http://127.0.0.1:8000/operation/c6c98133-856f-46b3-ba9e-3a0092b8d9aa/logs
Responses
| Status Code | Description | Component |
|---|---|---|
200
| Successful Response | array of strings |
422
| Validation Error | HTTPValidationError |
I.3.13. General schemas #
I.3.13.1. DbConnectionParams #
| Field | Type | Required | Description |
|---|---|---|---|
| host | string | Yes | Database host. |
| port | integer | Yes | Database port. |
| db_name | string | Yes | Database name. |
| user_login | string | Yes | Database user. |
| user_password | string | Yes | Database user password. |
I.3.13.2. DictionaryContent #
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Dictionary name. For example can be used as dictionary filename |
| content | string | Yes | Dictionary content that using for operations processing |
| additional_info | Any | No | Extra data for integration purposes. Will be sent on webhook “as is”. |
I.3.13.3. DictionaryMetadata #
| Field | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Dictionary name. For example can be used as dictionary filename |
| additional_info | Any | No | Extra data for integration purposes. Will be sent on webhook “as is”. |
I.3.13.4. OperationDataResponse #
| Field | Type | Required | Description |
|---|---|---|---|
| run_status | RunStatus | Yes | Operation status. |
| run_options | JSON | Yes | Snapshot of the operation’s runtime options. Useful for analysis, debugging, and rerunning the operation. |
| dictionaries | DictionariesData | Yes | Used and resulted dictionary contents by types. |
| extra_data | JSON | No | For dump operations contains dump size info. In other cases is empty. |
I.3.13.5. RunStatus #
| Field | Type | Required | Description |
|---|---|---|---|
| status_id | integer | Yes |
Numeric status code. Possible values:
2 — success, 3 —
error.
|
| status | string | Yes |
Human-readable status. Possible values:
success, error.
|
| started | string | No | Operation start timestamp in ISO8601 format (UTC+0). |
| ended | string | No | Operation end timestamp in ISO8601 format (UTC+0). |
I.3.13.6. DictionariesData #
| Field | Type | Required | Description |
|---|---|---|---|
| meta_dict_files | array of dictionary content | No | Contents of the input meta dictionary. |
| prepared_sens_dict_files | array of dictionary content | No | Contents of the input sensitive dictionary. |
| prepared_no_sens_dict_files | array of dictionary content | No | Contents of the input non-sensitive dictionary. |
| partial_tables_dict_files | array of dictionary content | No | Contents of the input tables dictionary. |
| partial_tables_exclude_dict_files | array of dictionary content | No | Contents of the input tables dictionary. |
| output_sens_dict_file | dictionary content | No | Contents of the output sensitive dictionary. |
| output_no_sens_dict_file | dictionary content | No | Contents of the output non-Sensitive dictionary. |
I.3.13.7. ViewDataResponse #
| Field | Type | Required | Description |
|---|---|---|---|
| status_id | integer | Yes |
Integer code of operation status. Can be:
2 — success, 3 —
error
|
| status | string | Yes |
Human readable operation status. Can be:
success, error
|
| content | ViewDataContent | No | Operation result data |
I.3.13.8. ViewDataContent #
| Field | Type | Required | Description |
|---|---|---|---|
| schema_name | string | Yes | Schema name |
| table_name | string | Yes | Table name |
| field_names | array | Yes | Table field names. It needs for rendering table header |
| total_rows_count | integer | Yes | Total rows count in table. It useful for pagination |
| rows_before | array | Yes | Source rows data “as is” without anonymization |
| rows_after | array | Yes | Anonymized rows, for display how anonymization will work on source data |
I.3.13.9. ViewFieldsResponse #
| Field | Type | Required | Description |
|---|---|---|---|
| status_id | integer | Yes |
Integer code of operation status. Can be:
2 — success, 3 —
error
|
| status | string | Yes |
Human readable operation status. Can be:
success, error
|
| content | ViewFieldsContent | No | Operation result data |
I.3.13.10. ViewFieldsContent #
| Field | Type | Required | Description |
|---|---|---|---|
| schema_name | string | Yes | Schema name |
| table_name | string | Yes | Table name |
| field_name | string | Yes | Field name |
| type | string | Yes | Field data type |
| dict_data | DictionaryMetadata | No | Matched dictionary metadata containing anonymization rule |
| rule | str | No | Matched anonymization rule if field is sensitive |
I.3.13.11. PreviewSchemasResponse #
| Field | Type | Required | Description |
|---|---|---|---|
| status_id | integer | Yes | Integer code of operation status. Can be: 2 - success, 3 - error |
| status | string | Yes | Human readable operation status. Can be: success, error |
| content | array of string | No | List of schema names |
I.3.13.12. PreviewSchemaTablesResponse #
| Field | Type | Required | Description |
|---|---|---|---|
| status_id | integer | Yes | Integer code of operation status. Can be: 2 - success, 3 - error |
| status | string | Yes | Human readable operation status. Can be: success, error |
| content | array of PreviewTableContent | No | List of tables with sensitivity info |
I.3.13.13. PreviewTableContent #
| Field | Type | Required | Description |
|---|---|---|---|
| table_name | string | Yes | Table name |
| is_sensitive | boolean | Yes | Whether the table matches the sensitive dictionary |
| is_excluded | boolean | Yes | Whether the table is excluded by dictionary_exclude |
| fields | array of PreviewFieldContent | No | List of fields with their types and sensitivity info |
I.3.13.14. PreviewFieldContent #
| Field | Type | Required | Description |
|---|---|---|---|
| field_name | string | Yes | Field name |
| type | string | Yes | Field data type |
| is_sensitive | boolean | Yes | Whether the field matches an anonymization rule |
| rule | string | No | Matched anonymization rule if field is sensitive |
I.3.13.15. ErrorResponse #
| Field | Type | Required | Description |
|---|---|---|---|
| error_code | string | Yes | Machine-readable error code for i18n mapping. |
| message | string | Yes | Human-readable error message. |
I.3.13.16. HTTPValidationError #
| Field | Type | Required | Description |
|---|---|---|---|
| detail | array | Yes | Error details. |
I.3.13.17. ValidationError #
| Field | Type | Required | Description |
|---|---|---|---|
| loc | array | Yes | Wrong parameter. |
| msg | string | Yes | Error message. |
| type | string | Yes | Error type. |