F.33. pg_cron#
F.33. pg_cron #
F.33.2. What is pg_cron? #
pg_cron is a simple cron-based job scheduler for Tantor BE that runs inside the database as an extension. It uses the
same syntax as regular cron, but it allows you to schedule
Tantor BE commands directly from the database. You can also use
[1-59] seconds
to schedule a job based on an interval.
pg_cron also allows you using $
to indicate last day of the
month.
-- Delete old data on Saturday at 3:30am (GMT) SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$); schedule ---------- 42 -- Vacuum every day at 10:00am (GMT) SELECT cron.schedule('nightly-vacuum', '0 10 * * *', 'VACUUM'); schedule ---------- 43 -- Change to vacuum at 3:00am (GMT) SELECT cron.schedule('nightly-vacuum', '0 3 * * *', 'VACUUM'); schedule ---------- 43 -- Stop scheduling jobs SELECT cron.unschedule('nightly-vacuum' ); unschedule ------------ t SELECT cron.unschedule(42); unschedule ------------ t -- Vacuum every Sunday at 4:00am (GMT) in a database other than the one pg_cron is installed in SELECT cron.schedule_in_database('weekly-vacuum', '0 4 * * 0', 'VACUUM', 'some_other_database'); schedule ---------- 44 -- Call a stored procedure every 5 seconds SELECT cron.schedule('process-updates', '5 seconds', 'CALL process_updates()'); -- Process payroll at 12:00 of the last day of each month SELECT cron.schedule('process-payroll', '0 12 $ * *', 'CALL process_payroll()');
pg_cron can run multiple jobs in parallel, but it runs at most one instance of a job at a time. If a second run is supposed to start before the first one finishes, then the second run is queued and started as soon as the first run completes.
The schedule uses the standard cron syntax, in which *
means “run
every time period”, and a specific number means “but only at this
time”:
┌───────────── min (0 - 59) │ ┌────────────── hour (0 - 23) │ │ ┌─────────────── day of month (1 - 31) or last day of the month ($) │ │ │ ┌──────────────── month (1 - 12) │ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to │ │ │ │ │ Saturday, or use names; 7 is also Sunday) │ │ │ │ │ │ │ │ │ │ * * * * *
An easy way to create a cron schedule is: crontab.guru.
The code in pg_cron that handles parsing and scheduling comes directly from the cron source code by Paul Vixie, hence the same options are supported.
F.33.3. Setting up pg_cron #
To start the pg_cron background worker when Tantor BE starts, you
need to add pg_cron to shared_preload_libraries
in postgresql.conf. Note that pg_cron does not run any jobs as a
long a server is in
hot standby mode, but it automatically starts when the server
is promoted.
# add to postgresql.conf # required to load pg_cron background worker on start-up shared_preload_libraries = 'pg_cron'
By default, the pg_cron background worker expects its metadata
tables to be created in the “postgres” database. However, you can
configure this by setting the
cron.database_name
configuration parameter in
postgresql.conf.
# add to postgresql.conf # optionally, specify the database in which the pg_cron background worker should run (defaults to postgres) cron.database_name = 'postgres'
pg_cron
may only be installed to one database
in a cluster. If you need to run jobs in multiple databases, use
cron.schedule_in_database()
.
Previously pg_cron could only use GMT time, but now you can adapt
your time by setting cron.timezone
in
postgresql.conf.
# add to postgresql.conf # optionally, specify the timezone in which the pg_cron background worker should run (defaults to GMT). E.g: cron.timezone = 'PRC'
After restarting Tantor BE, you can create the pg_cron functions
and metadata tables using
CREATE EXTENSION pg_cron
.
-- run as superuser: CREATE EXTENSION pg_cron; -- optionally, grant usage to regular users: GRANT USAGE ON SCHEMA cron TO marco;
F.33.3.1. Ensuring pg_cron can start jobs #
Caution
By default,
pg_cron uses libpq to open a new connection to the local
database, which needs to be allowed by
pg_hba.conf.
It may be necessary to enable trust
authentication for connections coming from localhost in for the
user running the cron job, or you can add the password to a
.pgpass file, which libpq will use when opening a connection.
You can also use a unix domain socket directory as the hostname
and enable trust
authentication for local
connections in
pg_hba.conf,
which is normally safe:
# Connect via a unix domain socket: cron.host = '/tmp' # Can also be an empty string to look for the default directory: cron.host = ''
Alternatively, pg_cron can be configured to use background
workers. In that case, the number of concurrent jobs is limited
by the max_worker_processes
setting, so you
may need to raise that.
# Schedule jobs via background workers instead of localhost connections cron.use_background_workers = on # Increase the number of available background workers from the default of 8 max_worker_processes = 20
For security, jobs are executed in the database in which the
cron.schedule
function is called with the
same permissions as the current user. In addition, users are
only able to see their own jobs in the
cron.job
table.
F.33.4. Viewing job run details #
You can view the status of running and recently completed job runs
in the cron.job_run_details
:
select * from cron.job_run_details order by start_time desc limit 5; ┌───────┬───────┬─────────┬──────────┬──────────┬───────────────────┬───────────┬──────────────────┬───────────────────────────────┬───────────────────────────────┐ │ jobid │ runid │ job_pid │ database │ username │ command │ status │ return_message │ start_time │ end_time │ ├───────┼───────┼─────────┼──────────┼──────────┼───────────────────┼───────────┼──────────────────┼───────────────────────────────┼───────────────────────────────┤ │ 10 │ 4328 │ 2610 │ postgres │ marco │ select process() │ succeeded │ SELECT 1 │ 2023-02-07 09:30:00.098164+01 │ 2023-02-07 09:30:00.130729+01 │ │ 10 │ 4327 │ 2609 │ postgres │ marco │ select process() │ succeeded │ SELECT 1 │ 2023-02-07 09:29:00.015168+01 │ 2023-02-07 09:29:00.832308+01 │ │ 10 │ 4321 │ 2603 │ postgres │ marco │ select process() │ succeeded │ SELECT 1 │ 2023-02-07 09:28:00.011965+01 │ 2023-02-07 09:28:01.420901+01 │ │ 10 │ 4320 │ 2602 │ postgres │ marco │ select process() │ failed │ server restarted │ 2023-02-07 09:27:00.011833+01 │ 2023-02-07 09:27:00.72121+01 │ │ 9 │ 4320 │ 2602 │ postgres │ marco │ select do_stuff() │ failed │ job canceled │ 2023-02-07 09:26:00.011833+01 │ 2023-02-07 09:26:00.22121+01 │ └───────┴───────┴─────────┴──────────┴──────────┴───────────────────┴───────────┴──────────────────┴───────────────────────────────┴───────────────────────────────┘ (10 rows)
The records in cron.job_run_details
are not
cleaned automatically, but every user that can schedule cron jobs
also has permission to delete their own
cron.job_run_details
records.
Especially when you have jobs that run every few seconds, it can be a good idea to clean up regularly, which can easily be done using pg_cron itself:
-- Delete old cron.job_run_details records of the current user every day at noon SELECT cron.schedule('delete-job-run-details', '0 12 * * *', $$DELETE FROM cron.job_run_details WHERE end_time < now() - interval '7 days'$$);
If you do not want to use cron.job_run_details
at all, then you can add cron.log_run = off
to
postgresql.conf
.