9.23. System Catalog Declarations and Initial Contents

PostgreSQL uses many different system catalogs to keep track of the existence and properties of database objects, such as tables and functions. Physically there is no difference between a system catalog and a plain user table, but the backend C code knows the structure and properties of each catalog, and can manipulate it directly at a low level. Thus, for example, it is inadvisable to attempt to alter the structure of a catalog on-the-fly; that would break assumptions built into the C code about how rows of the catalog are laid out. But the structure of the catalogs can change between major versions.

The structures of the catalogs are declared in specially formatted C header files in the src/include/catalog/ directory of the source tree. For each catalog there is a header file named after the catalog (e.g., pg_class.h for pg_class), which defines the set of columns the catalog has, as well as some other basic properties such as its OID.

Many of the catalogs have initial data that must be loaded into them during the bootstrap phase of initdb, to bring the system up to a point where it is capable of executing SQL commands. (For example, pg_class.h must contain an entry for itself, as well as one for each other system catalog and index.) This initial data is kept in editable form in data files that are also stored in the src/include/catalog/ directory. For example, pg_proc.dat describes all the initial rows that must be inserted into the pg_proc catalog.

To create the catalog files and load this initial data into them, a backend running in bootstrap mode reads a BKI (Backend Interface) file containing commands and initial data. The postgres.bki file used in this mode is prepared from the aforementioned header and data files, while building a PostgreSQL distribution, by a Perl script named genbki.pl. Although it’s specific to a particular PostgreSQL release, postgres.bki is platform-independent and is installed in the share subdirectory of the installation tree.

genbki.pl also produces a derived header file for each catalog, for example pg_class_d.h for the pg_class catalog. This file contains automatically-generated macro definitions, and may contain other macros, enum declarations, and so on that can be useful for client C code that reads a particular catalog.

Most PostgreSQL developers don’t need to be directly concerned with the BKI file, but almost any nontrivial feature addition in the backend will require modifying the catalog header files and/or initial data files. The rest of this chapter gives some information about that, and for completeness describes the BKI file format.

9.23.1. System Catalog Declaration Rules

The key part of a catalog header file is a C structure definition describing the layout of each row of the catalog. This begins with a CATALOG macro, which so far as the C compiler is concerned is just shorthand for typedef struct FormData_catalogname. Each field in the struct gives rise to a catalog column. Fields can be annotated using the BKI property macros described in genbki.h, for example to define a default value for a field or mark it as nullable or not nullable. The CATALOG line can also be annotated, with some other BKI property macros described in genbki.h, to define other properties of the catalog as a whole, such as whether it is a shared relation.

The system catalog cache code (and most catalog-munging code in general) assumes that the fixed-length portions of all system catalog tuples are in fact present, because it maps this C struct declaration onto them. Thus, all variable-length fields and nullable fields must be placed at the end, and they cannot be accessed as struct fields. For example, if you tried to set pg_type.**typrelid** to be NULL, it would fail when some piece of code tried to reference typetup->typrelid (or worse, typetup->typelem, because that follows typrelid). This would result in random errors or even segmentation violations.

As a partial guard against this type of error, variable-length or nullable fields should not be made directly visible to the C compiler. This is accomplished by wrapping them in #ifdef CATALOG_VARLEN … #endif (where CATALOG_VARLEN is a symbol that is never defined). This prevents C code from carelessly trying to access fields that might not be there or might be at some other offset. As an independent guard against creating incorrect rows, we require all columns that should be non-nullable to be marked so in pg_attribute. The bootstrap code will automatically mark catalog columns as NOT NULL if they are fixed-width and are not preceded by any nullable or variable-width column. Where this rule is inadequate, you can force correct marking by using BKI_FORCE_NOT_NULL and BKI_FORCE_NULL annotations as needed.

Frontend code should not include any pg_xxx.h catalog header file, as these files may contain C code that won’t compile outside the backend. (Typically, that happens because these files also contain declarations for functions in src/backend/catalog/ files.) Instead, frontend code may include the corresponding generated pg_xxx_d.h header, which will contain OID #define**s and any other data that might be of use on the client side. If you want macros or other code in a catalog header to be visible to frontend code, write **#ifdef EXPOSE_TO_CLIENT_CODE … #endif around that section to instruct genbki.pl to copy that section to the pg_xxx_d.h header.

A few of the catalogs are so fundamental that they can’t even be created by the BKI create command that’s used for most catalogs, because that command needs to write information into these catalogs to describe the new catalog. These are called bootstrap catalogs, and defining one takes a lot of extra work: you have to manually prepare appropriate entries for them in the pre-loaded contents of pg_class and pg_type, and those entries will need to be updated for subsequent changes to the catalog’s structure. (Bootstrap catalogs also need pre-loaded entries in pg_attribute, but fortunately genbki.pl handles that chore nowadays.) Avoid making new catalogs be bootstrap catalogs if at all possible.

9.23.2. System Catalog Initial Data

Each catalog that has any manually-created initial data (some do not) has a corresponding .dat file that contains its initial data in an editable format.

9.23.2.1. Data File Format

Each .dat file contains Perl data structure literals that are simply eval’d to produce an in-memory data structure consisting of an array of hash references, one per catalog row. A slightly modified excerpt from pg_database.dat will demonstrate the key features:

[

# A comment could appear here.
{ oid => '1', oid_symbol => 'Template1DbOid',
  descr => 'database\'s default template',
  datname => 'template1', encoding => 'ENCODING',
  datlocprovider => 'LOCALE_PROVIDER', datistemplate => 't',
  datallowconn => 't', datconnlimit => '-1', datfrozenxid => '0',
  datminmxid => '1', dattablespace => 'pg_default', datcollate => 'LC_COLLATE',
  datctype => 'LC_CTYPE', daticulocale => 'ICU_LOCALE', datacl => '_null_' },

]

Points to note:

9.23.2.2. OID Assignment

A catalog row appearing in the initial data can be given a manually-assigned OID by writing an oid => nnnn metadata field. Furthermore, if an OID is assigned, a C macro for that OID can be created by writing an oid_symbol => name metadata field.

Pre-loaded catalog rows must have preassigned OIDs if there are OID references to them in other pre-loaded rows. A preassigned OID is also needed if the row’s OID must be referenced from C code. If neither case applies, the oid metadata field can be omitted, in which case the bootstrap code assigns an OID automatically. In practice we usually preassign OIDs for all or none of the pre-loaded rows in a given catalog, even if only some of them are actually cross-referenced.

Writing the actual numeric value of any OID in C code is considered very bad form; always use a macro, instead. Direct references to pg_proc OIDs are common enough that there’s a special mechanism to create the necessary macros automatically; see src/backend/utils/Gen_fmgrtab.pl. Similarly — but, for historical reasons, not done the same way — there’s an automatic method for creating macros for pg_type OIDs. oid_symbol entries are therefore not necessary in those two catalogs. Likewise, macros for the pg_class OIDs of system catalogs and indexes are set up automatically. For all other system catalogs, you have to manually specify any macros you need via oid_symbol entries.

To find an available OID for a new pre-loaded row, run the script src/include/catalog/unused_oids. It prints inclusive ranges of unused OIDs (e.g., the output line 45-900 means OIDs 45 through 900 have not been allocated yet). Currently, OIDs 1–9999 are reserved for manual assignment; the unused_oids script simply looks through the catalog headers and .dat files to see which ones do not appear. You can also use the duplicate_oids script to check for mistakes. (genbki.pl will assign OIDs for any rows that didn’t get one hand-assigned to them, and it will also detect duplicate OIDs at compile time.)

When choosing OIDs for a patch that is not expected to be committed immediately, best practice is to use a group of more-or-less consecutive OIDs starting with some random choice in the range 8000—9999. This minimizes the risk of OID collisions with other patches being developed concurrently. To keep the 8000—9999 range free for development purposes, after a patch has been committed to the master git repository its OIDs should be renumbered into available space below that range. Typically, this will be done near the end of each development cycle, moving all OIDs consumed by patches committed in that cycle at the same time. The script renumber_oids.pl can be used for this purpose. If an uncommitted patch is found to have OID conflicts with some recently-committed patch, renumber_oids.pl may also be useful for recovering from that situation.

Because of this convention of possibly renumbering OIDs assigned by patches, the OIDs assigned by a patch should not be considered stable until the patch has been included in an official release. We do not change manually-assigned object OIDs once released, however, as that would create assorted compatibility problems.

If genbki.pl needs to assign an OID to a catalog entry that does not have a manually-assigned OID, it will use a value in the range 10000—11999. The server’s OID counter is set to 10000 at the start of a bootstrap run, so that any objects created on-the-fly during bootstrap processing also receive OIDs in this range. (The usual OID assignment mechanism takes care of preventing any conflicts.)

Objects with OIDs below FirstUnpinnedObjectId (12000) are considered pinned, preventing them from being deleted. (There are a small number of exceptions, which are hard-wired into IsPinnedObject().) initdb forces the OID counter up to FirstUnpinnedObjectId as soon as it’s ready to create unpinned objects. Thus objects created during the later phases of initdb, such as objects created while running the information_schema.sql script, will not be pinned, while all objects known to genbki.pl will be.

OIDs assigned during normal database operation are constrained to be 16384 or higher. This ensures that the range 10000—16383 is free for OIDs assigned automatically by genbki.pl or during initdb. These automatically-assigned OIDs are not considered stable, and may change from one installation to another.

9.23.2.3. OID Reference Lookup

In principle, cross-references from one initial catalog row to another could be written just by writing the preassigned OID of the referenced row in the referencing field. However, that is against project policy, because it is error-prone, hard to read, and subject to breakage if a newly-assigned OID is renumbered. Therefore genbki.pl provides mechanisms to write symbolic references instead. The rules are as follows:

genbki.pl resolves all symbolic references while it runs, and puts simple numeric OIDs into the emitted BKI file. There is therefore no need for the bootstrap backend to deal with symbolic references.

It’s desirable to mark OID reference columns with BKI_LOOKUP or BKI_LOOKUP_OPT even if the catalog has no initial data that requires lookup. This allows genbki.pl to record the foreign key relationships that exist in the system catalogs. That information is used in the regression tests to check for incorrect entries. See also the macros DECLARE_FOREIGN_KEY, DECLARE_FOREIGN_KEY_OPT, DECLARE_ARRAY_FOREIGN_KEY, and DECLARE_ARRAY_FOREIGN_KEY_OPT, which are used to declare foreign key relationships that are too complex for BKI_LOOKUP (typically, multi-column foreign keys).

9.23.2.4. Automatic Creation of Array Types

Most scalar data types should have a corresponding array type (that is, a standard varlena array type whose element type is the scalar type, and which is referenced by the typarray field of the scalar type’s pg_type entry). genbki.pl is able to generate the pg_type entry for the array type automatically in most cases.

To use this facility, just write an array_type_oid => nnnn metadata field in the scalar type’s pg_type entry, specifying the OID to use for the array type. You may then omit the typarray field, since it will be filled automatically with that OID.

The generated array type’s name is the scalar type’s name with an underscore prepended. The array entry’s other fields are filled from BKI_ARRAY_DEFAULT(value) annotations in pg_type.h, or if there isn’t one, copied from the scalar type. (There’s also a special case for typalign.) Then the typelem and typarray fields of the two entries are set to cross-reference each other.

9.23.2.5. Recipes for Editing Data Files

Here are some suggestions about the easiest ways to perform common tasks when updating catalog data files.

9.23.2.5.1. Add a new column with a default to a catalog:

Add the column to the header file with a BKI_DEFAULT(value) annotation. The data file need only be adjusted by adding the field in existing rows where a non-default value is needed.

Add a default value to an existing column that doesn’t have one: ****************************************************************

Add a BKI_DEFAULT annotation to the header file, then run make reformat-dat-files to remove now-redundant field entries.

9.23.2.5.2. Remove a column, whether it has a default or not:

Remove the column from the header, then run make reformat-dat-files to remove now-useless field entries.

9.23.2.5.3. Change or remove an existing default value:

You cannot simply change the header file, since that will cause the current data to be interpreted incorrectly. First run make expand-dat-files to rewrite the data files with all default values inserted explicitly, then change or remove the BKI_DEFAULT annotation, then run make reformat-dat-files to remove superfluous fields again.

9.23.2.5.4. Ad-hoc bulk editing:

reformat_dat_file.pl can be adapted to perform many kinds of bulk changes. Look for its block comments showing where one-off code can be inserted. In the following example, we are going to consolidate two Boolean fields in pg_proc into a char field:

Add the new column, with a default, to pg_proc.h:

+    /* see PROKIND_ categories below */
+    char        prokind BKI_DEFAULT(f);

Create a new script based on reformat_dat_file.pl to insert appropriate values on-the-fly:

-           # At this point we have the full row in memory as a hash
-           # and can do any operations we want. As written, it only
-           # removes default values, but this script can be adapted to
-           # do one-off bulk-editing.
+           # One-off change to migrate to prokind
+           # Default has already been filled in by now, so change to other
+           # values as appropriate
+           if ($values{proisagg} eq 't')
+           {
+               $values{prokind} = 'a';
+           }
+           elsif ($values{proiswindow} eq 't')
+           {
+               $values{prokind} = 'w';
+           }

Run the new script:
```
$ cd src/include/catalog
$ perl  rewrite_dat_with_prokind.pl  pg_proc.dat
      At this point pg_proc.dat has all three
```
columns, prokind, proisagg, and proiswindow, though they will appear only in rows where they have non-default values.

Remove the old columns from pg_proc.h:

-    /* is it an aggregate? */
-    bool        proisagg BKI_DEFAULT(f);
-
-    /* is it a window function? */
-    bool        proiswindow BKI_DEFAULT(f);

Finally, run make reformat-dat-files to remove the useless old entries from pg_proc.dat.

For further examples of scripts used for bulk editing, see convert_oid2name.pl and remove_pg_type_oid_symbols.pl attached to this message: ` <https://www.postgresql.org/message-id/CAJVSVGVX8gXnPm+Xa=DxR7kFYprcQ1tNcCT5D0O3ShfnM6jehA@mail.gmail.com>`_

9.23.3. BKI File Format

This section describes how the PostgreSQL backend interprets BKI files. This description will be easier to understand if the postgres.bki file is at hand as an example.

BKI input consists of a sequence of commands. Commands are made up of a number of tokens, depending on the syntax of the command. Tokens are usually separated by whitespace, but need not be if there is no ambiguity. There is no special command separator; the next token that syntactically cannot belong to the preceding command starts a new one. (Usually you would put a new command on a new line, for clarity.) Tokens can be certain key words, special characters (parentheses, commas, etc.), identifiers, numbers, or single-quoted strings. Everything is case sensitive.

Lines starting with # are ignored.

9.23.4. BKI Commands

9.23.5. Structure of the Bootstrap BKI File

The open command cannot be used until the tables it uses exist and have entries for the table that is to be opened. (These minimum tables are pg_class, pg_attribute, pg_proc, and pg_type.) To allow those tables themselves to be filled, create with the bootstrap option implicitly opens the created table for data insertion.

Also, the declare index and declare toast commands cannot be used until the system catalogs they need have been created and filled in.

Thus, the structure of the postgres.bki file has to be:

create bootstrap one of the critical tables
insert data describing at least the critical tables
close
Repeat for the other critical tables.
create (without bootstrap) a noncritical table
open
insert desired data
close
Repeat for the other noncritical tables.
Define indexes and toast tables.
build indices

There are doubtless other, undocumented ordering dependencies.

9.23.6. BKI Example

The following sequence of commands will create the table test_table with OID 420, having three columns oid, cola and colb of type oid, int4 and text, respectively, and insert two rows into the table:

create test_table 420 (oid = oid, cola = int4, colb = text)
open test_table
insert ( 421 1 'value 1' )
insert ( 422 2 _null_ )
close test_table