H.2. data_generator#

H.2. data_generator

H.2. data_generator #

H.2.1. Overview #

data_generator is a python script that will generate fake data for you.

usage: populate.py [-h] --table {address,city,country,first_name,last_name,company,email,iban,lorem_ipsum,postcode}
                   [{address,city,country,first_name,last_name,company,email,iban,lorem_ipsum,postcode} ...] [--locales LOCALES] --output_dir OUTPUT_DIR
                   [--lines LINES] [--seed SEED]

Internally it uses library Faker

To produce 5000 emails in Russian & English, you’d call the scripts like this:

populate.py --table country email --locales ru_RU,en --lines 5000 --output_dir out

This will output the fake data in CSV format.

Use populate.py --help for more details about the script parameters.

You can load the fake data directly into the extension like this:

TRUNCATE transp_anon.email;

COPY transp_anon.email
FROM
PROGRAM 'populate.py --table country email --locales ru_RU,en --lines 5000 --output_dir out';

SELECT setval('transp_anon.email_oid_seq', max(oid))
FROM transp_anon.email;

CLUSTER transp_anon.email;

H.2.2. Faker #

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you. For more information see Faker Documentation.