pg_integrity_verifier#

pg_integrity_verifier

pg_integrity_verifier

pg_integrity_verifier — check integrity of Tantor's databases

pg_integrity_verifier

pg_integrity_verifier is a specialized tool designed for monitoring the integrity of PostgreSQL databases. The application is designed to assist database administrators or security specialists in monitoring the integrity of databases. It saves database state records in the specified directory, and in case of any changes in the monitored databases, it creates a snapshot in sn_{database}_{datetime}.diff format. Then pg_integrity_verifier executes the command specified in run_command in the main configuration file. run_command specifies a command, script, or other application intended for analyzing changes and making decisions. For example, if a change in the structure of any table is detected, a notification can be sent to the specialists in charge. How and where the notification will be sent shall be determined solely by the external application. In addition to notifications, you can perform actions that modify the structure of the DBMS or its configurations.

pg_integrity_verifier can be run by any Linux OS user with the authority to execute the file. It is important to note that the only permissible point to connect to databases for integrity control is the current host, i.e. pg_integrity_verifier must run on the same host as the monitored database, since the application requires access to local PostgreSQL configuration files postgresql.conf and pg_hba.conf to control their integrity. Integrity verification can be applied to different database entities: users, configurations, tables, functions, triggers, and so on.

Launching parameters

List of startup parameters for pg_integrity_verifier:

Parameter Description
-c, --config Specifies the main configuration file.
-l, --log Specifies the configuration file for logging.
-v, --version Displays the application version.
-h, --help Displays a list of available options and usage information.

Launch example:

./pg_integrity_verifier -l log4rs.yaml -c pg_integrity_verifier.yaml

Main configuration file

The configuration of pg_integrity_verifier is set in -c parameter through the main configuration file in yaml format. This file allows you to set various parameters that define the interaction of the application with the observed databases.

File structure:

mode: "<operation mode: one_time or daemon>"
daemon_period: "<interval in seconds for daemon mode>"
databases:
  - db_url: "postgres://<username>:<password>@<DB address>:<port>/<DB name>"
    snapshot_directory: "<directory for DB snapshots>"
    observed_objs: [<list of DB objects to monitor>]
    run_command: "<command or script to execute upon detecting a change> -f DIFF_FILE"
  - ...

Where:

  • mode — application operating mode:

    • one_time — one-time launch.

    • daemon — periodic launch at specified time intervals.

  • daemon_period — the time interval in seconds between periodic launches.

  • databases — a list of monitored databases and their parameters:

    • db_url — database connection string in the format postgres://<username>:<password>@<database address>:<port>/<database name>.

      If the password contains special characters (@, %, # and so on), they must be encoded in Percent-encoding. For example, the password pass#word shall be specified as pass%23word.

      Passwords can be excluded from connection strings and passed in environment variables DB<database ordinal number>_PASSWORD=<password>. For example:

      export DB1_PASSWORD=12345
      export DB2_PASSWORD=54321
      

      Or when launching:

      DB1_PASSWORD=12345 DB2_PASSWORD=54321 ./pg_integrity_verifier \
          -l log4rs.yaml -c pg_integrity_verifier.yaml
      
    • snapshot_directory — a directory for saving snapshots of the current state of the database. Ensure that the directory is specified and differs from the directories used for other observed databases.

    • observed_objs — a list of database objects for monitoring in single quotes separated by commas: 'config_file', 'pg_hba', 'pg_settings', 'pg_user', 'pg_proc', 'pg_trigger', 'pg_roles'. If a user connected to the database does not have required privileges, the application will not check the postgresql.conf, pg_hba.conf, and pg_settings objects. If observed_objs is not specified, all objects will be checked, if observed_objs: [], the database objects will not be checked.

    • run_command — a command or script that the application will execute when a database integrity violation is detected. The called command or script must support -f DIFF_FILE argument, to which the name of a specific file containing changes is passed. The placeholder DIFF_FILE will be replaced with the path to the snapshot file containing the detected differences.

Example of the main configuration file:

mode: "daemon"
daemon_period: "5"
databases:
  - db_url: "postgres://postgres:12345@localhost:5432/db1"
    snapshot_directory: "json1"
    observed_objs: ['config_file', 'pg_hba', 'pg_settings', 'pg_user', 'pg_proc', 'pg_trigger', 'pg_roles']
    run_command: "tests/run_command.sh -f DIFF_FILE"
  - db_url: "postgres://postgres:54321@localhost:5432/db2"
    snapshot_directory: "json2"
    observed_objs: ['config_file', 'pg_hba', 'pg_settings', 'pg_user', 'pg_proc', 'pg_trigger', 'pg_roles']
    run_command: "tests/run_command.sh -f DIFF_FILE"

Configuration file for logging

Event logging and application messages are configured in the -l parameter and also through a yaml file.

File structure:

refresh_rate: "<configuration update interval in seconds> seconds"
appenders:
  <component name>:
    kind: <component type: console, file, rolling_file, udp, or tcp>
    encoder:
      pattern: "<output or log entry pattern>"
    path: "<log file path>" # Only for file and rolling_file component types
    policy: # Only for rolling_file component type
      trigger:
        kind: <condition type: size, time, or compound>
        limit: <file size> # Only for size condition type
        schedule: <cron format schedule with seconds extension> # Only for time condition type
      roller:
        kind: <rotation mechanism type: fixed_window, delete, or compound>
        base: <starting index> # Only for fixed_window rotation mechanism type
        count: <number of files> # Only for fixed_window rotation mechanism type
        pattern: "<rotation name pattern>"
    endpoint: "<address>:<port>" # Only for udp and tcp component types
    reconnect: # Only for tcp component type
      strategy: <reconnection strategy: immediate or fixed>
      delay_ms: <delay before reconnection attempt in milliseconds> # Only for fixed strategy
root:
  level: <minimum logging level: trace, debug, info, warn, error>
  appenders:
    - <component name 1>
    - <component name 2>
    ...

Where:

  • refresh_rate — the time interval in seconds between checks for changes in the configuration of observed databases.

  • appenders — are components that determine where logs will be output, recorded, or sent:

    • <component name> — arbitrary component name. You can list several components with their parameters:

      • kind — type of component:

        • console — output to console.

        • file — writing to a file without rotation.

        • rolling_file — writing to a file with rotation.

        • udp — sending via UDP.

        • tcp — sending via TCP.

      • encoder — logging format:

        • pattern — log formatting pattern. Supported variables:

          • {m} — log message.

          • {l} — logging level.

          • {M} — the name of the module.

          • {T} — the name of the current component.

          • {f} — the name of the file in which the log was called.

          • {L} — the line number in the source code where the log was called.

          • {t} — a timestamp in Unix format.

          • {I} — identifier of the component.

          • {d(...)} — the date and time specified in parentheses using Bash time parameters. For example, {d(%Y-%m-%d %H:%M:%S)} sets the date and time in the format YYYY-MM-DD HH:MM:SS.

          • {h(...)} — formatting the content in parentheses into ANSI color codes and text properties specified using parameters:

            • fg=<color> — text color.

            • bg=<color> — background color.

            • bold=true — bold font.

            • dim=true — dim font.

            • italic=true — italic.

            Color parameters can accept the following values: black, red, green, yellow, blue, magenta, cyan, white, and default. For example, {h({d(%Y-%m-%d %H:%M:%S)} - {l} - {m}{n}, fg=red, bg=black, bold=true)} will output the log specified in the template in red bold font on a black background.

          • {n} — line break.

      • path — path to the log file. Only for file and rolling_file component types. With the rolling_file component type, logs are written to this file before the rotation trigger is activated.

      • policy — a set of rules for log file rotation for the rolling_file component type:

        • trigger — the condition under which rotation begins:

          • kind — type of condition:

            • size — by file size.

            • time — by time.

            • compound — a combination of several types of conditions. For example, by size conditions for 100 MB files that are scheduled every day at midnight will look as follows:

              ...
                      kind: compound
                      triggers:
                        - kind: size
                          limit: 104857600 # 100 MB
                        - kind: time
                          schedule: "0 0 0 * * *" # Every day at midnight
              ...
              

              The file will be rotated when at least one of the specified conditions is met.

          • limit — size for size condition type. The file size value is specified in bytes, but it is also possible in KB, MB and GB, explicitly writing the dimension after the value, for example: 1048576kb, 1024mb, or 1gb, respectively. When the file exceeds the specified size, it will bef rotated.

          • schedule — schedule in cron format with an extension for seconds for time condition type. For example, */20 * * * * * expression will trigger rotation every 20 seconds.

        • roller — rotation mechanism. It determines how files will be processed after rotation.

          • kind — type of rotation mechanism:

            • fixed_window — a fixed number of rotation files. Old files will be renamed when creating a new file, and if the specified number is exceeded, they will be deleted.

            • delete — deletion of old files after rotation. This type does not provide for the retention of any number of old files, unlike fixed_window. As soon as the rotation occurs, the old file is deleted, and a new one is created in its place.

            • compound — a combination of several types of rotation mechanisms, each of which will be applied sequentially. For example, combining a fixed number of 10 rotating files, starting from the 1st, with the saving template logs/my_app_{}.log and deleting old files after rotation will look like this:

              ...
                      kind: compound
                      rollers:
                        - kind: fixed_window
                          base: 1
                          count: 10
                          pattern: "logs/my_app_{}.log"
                        - kind: delete
              ...
              

          • base — the starting index for numbering rotation files for fixed_window rotation mechanism type.

          • count — the number of rotation files for fixed_window rotation mechanism type. When this number is exceeded, the oldest file will be deleted.

          • pattern — file rotation name pattern. {} characters are used for file indexing. For example, logs/pg_integrity_{}.log indicates that files will be saved in logs under names pg_integrity_1.log, pg_integrity_2.log, and so on.

      • endpoint — address and port in <address>:<port> format, to which logs will be sent, for udp and tcp component types.

      • reconnect — reconnection parameters in case of disconnection tcp component type.

        • strategy — reconnection strategy:

          • immediate — immediately after the break.

          • fixed — with fixed delay.

        • delay_ms — delay in milliseconds before attempting to reconnect for the fixed reconnection strategy.

  • root — logging configuration:

    • level — the minimum severity level of messages that will be logged: trace, debug, info, warn, or error.

    • appenders — a list of the components described above that need to be involved in logging.

Example configuration file for logging:

refresh_rate: 5 seconds
appenders:
  stdout:
    kind: console
    encoder:
      pattern: "{d(%Y-%m-%d %H:%M:%S)} - {m}{n}"
  my_file_logger:
    kind: rolling_file
    path: "logs/pg_integrity.log"
    encoder:
      pattern: "{d(%Y-%m-%d %H:%M:%S)} - {m}{n}"
    policy:
      trigger:
        kind: size
        limit: 104857600
      roller:
        kind: fixed_window
        base: 1
        count: 10
        pattern: "logs/pg_integrity_{}.log"
root:
  level: info
  appenders:
    - stdout
    - my_file_logger

Example of use

  1. Create a simple bash script so that pg_integrity_verifier automatically runs it when the integrity of the specified DBMS object types is violated. To do this, create examples directory, create alert_example.sh script in it, and grant it execution rights:

    mkdir -p examples && \
    cat > examples/alert_example.sh << 'EOL'
    #!/bin/bash
    json_file=""
    # Function to print usage
    print_usage() {
      echo "Usage: $0 -f json_file"
    }
    while getopts 'f:' flag; do
      case "${flag}" in
      f) json_file="${OPTARG}" ;;
      *) print_usage
        exit 1 ;;
      esac
    done
    if [ -z "$json_file" ]; then
      echo "You must provide a JSON file."
      print_usage
      exit 1
    fi
    echo "\n===================================="
    cat $json_file
    echo "\n===================================="
    EOL
    chmod +x examples/alert_example.sh
    

    alert_example.sh script includes the following steps:

    1. It is specified that the script shall be executed using the bash interpreter.

    2. A variable json_file is declared, which will contain the path to the JSON file passed to the script via parameters.

    3. A function print_usage() is defined, which outputs a message on how to correctly use the script. $0 is the name of the script, and -f json_file indicates that you need to pass -f flag, followed by the path to the JSON file.

    4. Command line options are processed in a while loop using getopts, which allows for parsing flags and their arguments.

      If -f flag is detected, the file path is saved in the json_file variable.

      If the flag is unknown or missing, print_usage() function is called, and the script terminates with error code1.

    5. If json_file variable is empty (the parameter was not passed or was specified incorrectly), a message is displayed that a JSON file must be passed, and the script terminates with error code 1.

    6. If the JSON file is specified correctly, its contents are displayed on the screen within dividing lines for more readable display of data.

    When the script is called, the passed file is displayed in stdout. For practical use, it is necessary to add logic for analyzing changes and performing specific actions, such as sending alerts.

  2. Create the main configuration file pg_integrity_verifier.yaml and specify examples/alert_example.sh. For example:

    mode: "daemon"
    daemon_period: "5"
    databases:
      - db_url: "postgres://postgres:12345@localhost:5432/db1"
        snapshot_directory: "json1"
        observed_objs: ['config_file', 'pg_hba', 'pg_settings', 'pg_user', 'pg_proc', 'pg_trigger', 'pg_roles']
        run_command: "examples/alert_example.sh -f DIFF_FILE"
      - db_url: "postgres://postgres:54321@localhost:5432/db2"
        snapshot_directory: "json2"
        observed_objs: ['config_file', 'pg_hba', 'pg_settings', 'pg_user', 'pg_proc', 'pg_trigger', 'pg_roles']
        run_command: "examples/alert_example.sh -f DIFF_FILE"
    
  3. Create a configuration file for logging log4rs.yaml:

    refresh_rate: 5 seconds
    appenders:
      stdout:
        kind: console
        encoder:
          pattern: "{d(%Y-%m-%d %H:%M:%S)} - {m}{n}"
      my_file_logger:
        kind: rolling_file
        path: "logs/pg_integrity.log"
        encoder:
          pattern: "{d(%Y-%m-%d %H:%M:%S)} - {m}{n}"
        policy:
          trigger:
            kind: size
            limit: 104857600
          roller:
            kind: fixed_window
            base: 1
            count: 10
            pattern: "logs/pg_integrity_{}.log"
    root:
      level: info
      appenders:
        - stdout
        - my_file_logger
    
  4. Run pg_integrity_verifier with the created configuration files:

    ./pg_integrity_verifier \
        -l log4rs.yaml \
        -c pg_integrity_verifier.yaml
    

    Result:

    2024-05-04 00:14:12 - db1 - The verifying started
    2024-05-04 00:14:12 - db1 - No changes found!
    2024-05-04 00:14:12 - db1 - The verifying finished
    2024-05-04 00:14:12 - db2 - The verifying started
    2024-05-04 00:14:12 - db2 - No changes found!
    2024-05-04 00:14:12 - db2 - The verifying finished
    2024-05-04 00:14:17 - db1 - The verifying started
    2024-05-04 00:14:17 - db1 - No changes found!
    2024-05-04 00:14:17 - db1 - The verifying finished
    2024-05-04 00:14:17 - db2 - The verifying started
    2024-05-04 00:14:17 - db2 - No changes found!
    2024-05-04 00:14:17 - db2 - The verifying finished
    ...
    
  5. Execute any SQL query that changes the structure of the database. For example, a request to create a user:

    su - postgres -c "psql -p 5432 -d db1 -U postgres -c \"CREATE USER test_user_tmp;\""
    
  6. Make sure that the output of pg_integrity_verifier includes a record on creating a new user:

    2024-05-04 00:15:02 - db1 - The verifying started
    2024-05-04 00:15:02 - db1 - Added new element with 40975 usesysid: pg_user (usesysid: 40975,
    usename: test_user_tmp, userecreated: false, useuper: false, userepl: false, usebypassrcls: false,
    valuntil: None, useconfig: [None])
    2024-05-04 00:15:02 - db1 - Changes have been written to file: json1n_db1.05-04-2024_00:15:02.diff
    2024-05-04 00:15:02 - db1 - `examples/alert_example.sh -f /opt/tantor/db/15/tools/
    pg_integrity_verifier/json1n_db1_05-04-2024_00:15:02.diff` executed successfully with exit status 0
    2024-05-04 00:15:02 - db1 -
    ====================================
    {
      "database": "db1",
      "datetime": "05-04-2024_00:15:02",
      "functions": null,
      "hba_conf": null,
      "pg_class": null,
      "pg_settings": null,
      "postgresql_conf": null,
      "triggers": null,
      "users": [
        {
          "new": {
            "usebypassrls": false,
            "useconfig": null,
            "usecreatedb": false,
            "usename": "test_user_tmp",
            "userepl": false,
            "usesuper": false,
            "usesysid": 40975,
            "valuntil": null
          },
          "usesysid": 40975
        }
      ]
    }
    ====================================
    2024-05-04 00:15:02 - db1 - The verifying finished
    2024-05-04 00:15:02 - db2 - The verifying started
    2024-05-04 00:15:02 - db2 - No changes found!
    2024-05-04 00:15:02 - db2 - The verifying finished
    

To alert DBMS administrators, you can use scripts that will be called by the pg_integrity_verifier application when changes are recorded in the observed database. In the reviewed example, alert_example.sh accepts a file containing a list of changes through the -f argument and displays its contents. For the actual application, the contents of the incoming file must be interpreted, and based on the changes, an external system must be called to perform notification functions, such as sending an e-mail or messenger alert.