F.63. pg_throttle#

F.63. pg_throttle

F.63. pg_throttle #

pg_throttle - is a Tantor SE extension which add query throttling feature.

F.63.1. Functionality #

Extension allows to set upper limit to amount of tuples sent from Scan nodes to upper nodes. I.e. SeqScan node can give no more than 100 tuples per second to the upper nodes.

Throttling is set as number of tuples returned by scan node per second. Supported ScanNodes are:

  • SeqScan

  • IndexScan/IndexOnlyScan

  • BitmapHeapScan/BitmapIndexScan

  • CustomScan

  • ForeignScan

  • ResultScan/ValuesScan

  • FunctionScan

Also, there is support for other DML statements, which can modify table. I.e. INSERT, UPDATE, DELETE, but only if RETURNING clause specified.

There are 2 work-modes for throttling:

  • per-node - throttling is setup per node independently

    • greater speed (in comparison to query mode)

    • finer control granularity

  • query - throttling state is shared between all nodes

    • more strict throttling of query execution

Throttle speed is measured as number of tuples returned by corresponding node in 1 second. Extension only tracks amount of tuples and if it exceeds the limit (for less than 1 second) it just sleeps rest time. For example, throttling time is 100 (100 tuples per second):

  • 0.1s - 35 tuples returned

  • 0.2s - 35 tuples returned

  • 0.3s - 30 tuples returned

  • 0.4-1s - sleep

F.63.2. Setting up #

Extension must be loaded using session_preload_libraries. During loading it looks up it’s configuration table (see below) and setup hooks if necessary. Add extension to postgresql.conf

# postgresql.conf
session_preload_libraries = 'pg_throttle'

Last, run DB and create extension:

CREATE EXTENSION pg_throttle;

Note

Custom schema for extension can be specified

F.63.3. Getting started #

Install extension as in discussed in setting up section.

After that, you may want to throttle yourself to test extension. Run this:

CREATE TABLE tbl(
    value int
);
INSERT INTO tbl SELECT * FROM generate_series(1, 200);
SELECT throttle_user_per_node_all(current_user, 100);

Now, this user (which you use to login) will be throttled. Exit, login again and run this query:

SELECT * FROM tbl;

Duration of the query should be about 2 seconds: 200 tuples / 100 tuples per second = 2 seconds of delay.

F.63.4. Cgroup v2 support #

Extension allows to assign backend processes to a pre-created Cgroup version 2 (control group version 2). Using the script pg_throttle_setup_cgroup.sh, you can create and configure a control group with specified limits. After installing this extension, the script will be located in the directory share/postgresql/extension/pg_throttle_setup_cgroup.sh.

The script allows you to configure the following limits:

  • --cpu-max-percent N - CPU usage limit percent (0-100 or -1 for unlimited)

  • --memory-quota SIZE - Memory limit (e.g., 512M, 2G)

  • --cpuset CPUS - Allowed CPUs (e.g., 0-3,6)

  • --io-limit LIMIT - IO limit: (/path|device MAJ:MIN):rbps=1M:wbps=1M:riops=100:wiops=50

You can find out more about all the parameters of the script via --help.

To create a control group without limits ttcg, run the command:

sudo ./pg_throttle_setup_cgroup.sh --group-name "ttcg"

Creating new cgroup 'ttcg'...
Group 'ttcg' configured with parameters:
 cpu.max      = max 100000
 cpu.weight   = 100
 memory.max   = max
 pids.max     = max
 cpuset.cpus  = 
 io.max       = 

To set the CPU consumption limit to 50% for this control group, run the command:

sudo ./pg_throttle_setup_cgroup.sh --group-name "ttcg" --update --cpu-max-percent 50

Group 'ttcg' exists. Updating specified parameters...
Group 'ttcg' configured with parameters:
 cpu.max      = 800000 100000
 cpu.weight   = 100
 memory.max   = max
 pids.max     = max
 cpuset.cpus  = 
 io.max       = 

You can similarly set or update other limits for the control group.

The format of the I/O limits provides two options for specifying:

  • /path/to/pgdata_or_tablespace_dir:rbps=1M:wbps=1M:riops=100:wiops=50

  • 259:0:rbps=1M:wbps=1M:riops=100:wiops=50

In the first case, you can specify the path to the PGDATA directory or the path to the tablespace directory and then the values ​​of the I/O limits (the parameters are separated by the symbol :).

In the second case, you can specify the major and minor numbers of the block device on which the PGDATA directory or tablespace directory is located and then the values ​​of the I/O limits (the parameters are separated by the symbol :)

When specifying the path to the PGDATA directory or tablespace directory, the script will attempt to automatically determine the major and minor numbers of the block device on which these directories are located. The Cgroup API requires specifying the major and minor numbers of the block device to set I/O limits, the script helps simplify determining the block device numbers by providing the ability to specify only the path to the desired directory.

To get the Cgroup settings in the script parameters format, run the command:

sudo ./pg_throttle_setup_cgroup.sh --group-name "ttcg" --show

Cgroup settings for group 'ttcg':
--cpu-max-percent -1
...

To reset all Cgroup settings, run the command:

sudo ./pg_throttle_setup_cgroup.sh --group-name "ttcg" --show

Resetting all limits in group 'ttcg'...

To delete Cgroup, run the command:

sudo ./pg_throttle_setup_cgroup.sh --group-name "ttcg" --delete

Group 'ttcg' deleted

In order for backend processes executing queries of a specific user to be automatically assigned to the Cgroup, you need to call the throttle_user_cgroup function, where you specify the user name and the name of the Cgroup:

SELECT throttle_user_cgroup('analytics', 'ttcg');

It is important to know that after restarting the operating system, the required Cgroup will be deleted. To automatically create the required Cgroup, you can create a systemd unit that will create the required Cgroup at the time of operating system startup.

An example of a template systemd unit can be found in pg-throttle-cgroup.service.

Based on the systemd unit template, you need to create a link to the template:

sudo systemctl link ./pg-throttle-cgroups@.service

The systemd unit template accepts a parameter that specifies the name of the Cgroup. To create a systemd service that will recreate the ttsg Cgroup when the operating system starts, you need to run the command:

sudo systemctl enable pg-throttle-cgroups@ttcg.service
sudo systemctl start pg-throttle-cgroups@ttcg.service

Check the result of starting the systemd service:

sudo systemctl status pg-throttle-cgroups@ttcg.service

If you need to delete this systemd service and the link to the unit template, you need to run the command:

sudo systemctl disable pg-throttle-cgroup@ttcg.service
sudo systemctl daemon-reload

Stopping or deleting the systemd service does not delete the created Cgroup. To delete the Cgroup, you need to run the script pg_throttle_setup_cgroup.sh with the parameter --delete.

If the mount point of the Cgroup file system cgroup2fs is not located at /sys/fs/cgroup, then the extension has a GUC pg_throttle.cgroup_mount_point to specify the path of the mount point:

ALTER SYSTEM SET pg_throttle.cgroup_mount_point TO '/custom/mount/point/to/cgroup2fs';
SELECT pg_reload_conf();

The pg_throttle_setup_cgroup.sh script also has a parameter --cgroup-root for explicitly specifying the mount point of the Cgroups file system cgroup2fs.

F.63.5. Configuration #

pg_throttle.cgroup_mount_point - the mount point of the Cgroup file system cgroup2fs (default /sys/fs/cgroup).

F.63.6. API & examples #

FUNCTION throttle_user_per_node(username text, 
                                seq_scan_max int
                                idx_scan_max int,
                                idx_only_scan_max int
                                bitmap_heap_scan_max int,
                                bitmap_index_scan_max int
                                custom_scan_max int,
                                foreign_scan_max int
                                tid_range_scan_max int,
                                modify_scan_max int,
                                values_scan_max int,
                                result_scan_max int,
                                func_scan_max int);

This is main function to set user throttling. But, usually you will use other short-hand functions below.

username - name of user who should be throttled. If specified user does not exists it will throw exception.

XXX_max - other arguments specify throttling per corresponding node:

  • seq_scan_max - SeqScan

  • idx_scan_max - Index Scan

  • idx_only_scan_max - Index Only Scan

  • bitmap_heap_scan_max - Bitmap Heap Scan

  • bitmap_index_scan_max - Bitmap Index Scan

  • custom_scan_max - Custom Scan

  • foreign_scan_max - Foreign Scan

  • tid_range_scan_max - TID Range Scan

  • modify_scan_max - DML modification statements(INSERT, UPDATE, etc…), only for RETURNING

  • values_scan_max - Values Scan

  • result_scan_max - Result Scan

  • func_scan_max - Function Scan

For XXX_max arguments allowed value:

  • NULL - node will not be throttled. For query_max = NULL - enables per node mode

  • 0 < XXX_max - sets XXX_max tuples per second processing per second for corresponding node

Note

Negative or 0 is not allowed value. If you want to disable throttling - set it NULL

Example - throttle ‘analytics’ user.

SELECT throttle_user_per_node('analytics',
                              100,          -- SeqScan
                              1000, 1000,   -- Index/IndexOnly Scan
                              1000, 1000,   -- Bitmap Heap/Index Scan
                              NULL,         -- Custom Scan (disable throttling)
                              10,           -- Foreign Scan
                              100,          -- TID Range Scan
                              NULL,         -- DML (disable throttling)
                              NULL, NULL,   -- Values/Result Scan (disable throttling)
                              1000);        -- Function Scan
FUNCTION throttle_user_per_node_all(username text, max int);

This is short-hand function for throttle_user_per_node which sets all XXX_max to max.

Example:

SELECT throttle_user_per_node_all('analytics', 100);
FUNCTION throttle_user_query(username text, max int);

This function throttle all Scan nodes using shared state. That means that all nodes has same throttling counter. Counter applies to all supported nodes - you can not set specific nodes

Example:

SELECT throttle_user_query('analytics', 100);
FUNCTION unthrottle_user(username text);

This function deletes throttling from specified user.

Example:

SELECT unthrottle_user('analytics');
FUNCTION throttle_user_cgroup(username text, cgroup_name text);

This function sets for a specific user the name of the Cgroup in which the backend processes servicing queries for this user will be launched.

Example:

SELECT throttle_user_cgroup('analytics', 'ttcg');
TABLE throttlings(
    username text,
    query_throttle int,
    XXX_throttle int
);

This is a main configuration table of extension - stores throttling configuration for users:

  • username - name of throttled user

  • query_throttle - throttle for query mode (NULL if per-node mode)

  • XXX_throttle - is a throttle value for corresponding node

This table is read by extension at startup of backend, during extension initialization. So user should have access to it, but it creates security risks. To address this issue see security section below.

F.63.7. Implementation details #

Extension is implemented using decorators.

When ExecutorStart_hook fires we search it’s PlanState for interesting Nodes (which are supported). If such node is found we:

  1. Create our Throttling node

  2. Copy contents of old node

  3. Save ExecProcNode of old node (and other staff)

  4. Replace ExecProcNode in PlanState (it will be invoked by executor)

Then during execution:

  1. Executor calls PlanState->ExecProcNode

  2. Our throttling function is invoked

  3. Call saved ExecProcNode - get tuple

  4. Handle throttling logic

  5. Return tuple

This applies to every supported node, except CustomScan and ForeignScan. We can not do the same for them, because extension create their own structures above them (also decorator). So doing this again - we overwrite their data. To overcome this we create such decorator for it’s interface - CustomScanMethods and FdwRoutine.

F.63.8. Caveats #

There are many caveats that you must be aware of

F.63.8.1. Decorators #

Extension uses decorators to implement this functionality. If any other extension doing so, one day they can overwrite their data which leads to unpredictable behaviour.

If you find this, 1) please tell us (to warn other users) and 2) you must choose only one extension.

F.63.8.2. Parallel workers #

Parallel workers are handled in same way - they also use Scan nodes and throttling applies to them. But they have separate memory addresses, so query mode will not work for them.

F.63.8.3. Sawtooth wave #

Extension does not distributes the load evenly - it just sleeps when threshold was achieved. Because of that you may see sawtooth graphs.

F.63.8.4. Applying changes #

Extension caches throttling configuration in static variables. It reads them on extension initialization and later use them.

So, if you changed any other throttling parameter it will be used only on next session. That user may need to reconnect.

F.63.8.5. User to whom it is applied #

Throttling logic is applied to SESSION_USER. If you change it i.e. using SET ROLE throttling logic will apply to SESSION_USER with it’s startup configuration.

F.63.8.6. Security #

Configuration table throttlings is read during session startup (extension initialization) to get session user throttling parameters. Granting SELECT access to all users seems security leak, so there is builtin RLS enabled for table. There is single rule - CURRENT_USER = username, so users read only throttlings for themselves.

But at startup, default users do not have access to that table, and all queries will result in permission denied error. To fix this, grant SELECT to public (or whatever role you want).

REVOKE ALL ON TABLE throttlings FROM public;
GRANT SELECT ON TABLE throttlings TO public;

Now, only superusers can modify throttlings configuration table;

Note

throttlings already has RLS POLICY such session user can work only with it’s own records.

F.63.8.7. Dead tuples #

Extension handles only amount of tuples returned by node, but does not handles total amount of page/buffer access. This leads to situations when total amount of touched pages/buffers significantly exceeds imposed throttling.

Cases when this can fire:

  • Scan key (filter) - if there is filter in node, only tuples satisfied by filter returned - there may be hundreds of tuples have been processed before return any from this node.

  • Dead tuples - dead tuples are also read from pages, but they will not return from node - hundreds of dead tuples in table may affect the performance

F.63.8.8. DML throttling #

Extension throttles amounted of returned tuples, but INSERT, UPDATE, DELETE and other table modification statements returns only 1 tuple. So throttling is not applied to them.

But there are cases when do:

  1. These statements has RETURNING - in this case logic is applied correctly because this node returns multiple queries

  2. Input for these nodes given from other supported Scan nodes - examples of this includes:

    • DELETE FROM tbl WHERE value > 1 - input for DELETE is SeqScan (throttled) from same table

    • INSERT INTO tbl SELECT * FROM tbl2 - input for INSERT is SeqScan (throttled) from another table

    • INSERT INTO tbl SELECT * FROM generate_series(1, 100) - input for INSERT is FunctionScan (throttled)

    • UPDATE tbl SET value = value + 1 - input for UPDATE is SeqScan (throttled) from same table