9.4. Frontend/Backend Protocol

PostgreSQL uses a message-based protocol for communication between frontends and backends (clients and servers). The protocol is supported over TCP/IP and also over Unix-domain sockets. Port number 5432 has been registered with IANA as the customary TCP port number for servers supporting this protocol, but in practice any non-privileged port number can be used.

This document describes version 3.0 of the protocol, implemented in PostgreSQL 7.4 and later. For descriptions of the earlier protocol versions, see previous releases of the PostgreSQL documentation. A single server can support multiple protocol versions. The initial startup-request message tells the server which protocol version the client is attempting to use. If the major version requested by the client is not supported by the server, the connection will be rejected (for example, this would occur if the client requested protocol version 4.0, which does not exist as of this writing). If the minor version requested by the client is not supported by the server (e.g., the client requests version 3.1, but the server supports only 3.0), the server may either reject the connection or may respond with a NegotiateProtocolVersion message containing the highest minor protocol version which it supports. The client may then choose either to continue with the connection using the specified protocol version or to abort the connection.

In order to serve multiple clients efficiently, the server launches a new backend process for each client. In the current implementation, a new child process is created immediately after an incoming connection is detected. This is transparent to the protocol, however. For purposes of the protocol, the terms backend and server are interchangeable; likewise frontend and client are interchangeable.

9.4.1. Overview

The protocol has separate phases for startup and normal operation. In the startup phase, the frontend opens a connection to the server and authenticates itself to the satisfaction of the server. (This might involve a single message, or multiple messages depending on the authentication method being used.) If all goes well, the server then sends status information to the frontend, and finally enters normal operation. Except for the initial startup-request message, this part of the protocol is driven by the server.

During normal operation, the frontend sends queries and other commands to the backend, and the backend sends back query results and other responses. There are a few cases (such as NOTIFY) wherein the backend will send unsolicited messages, but for the most part this portion of a session is driven by frontend requests.

Termination of the session is normally by frontend choice, but can be forced by the backend in certain cases. In any case, when the backend closes the connection, it will roll back any open (incomplete) transaction before exiting.

Within normal operation, SQL commands can be executed through either of two sub-protocols. In the simple query protocol, the frontend just sends a textual query string, which is parsed and immediately executed by the backend. In the extended query protocol, processing of queries is separated into multiple steps: parsing, binding of parameter values, and execution. This offers flexibility and performance benefits, at the cost of extra complexity.

Normal operation has additional sub-protocols for special operations such as COPY.

9.4.1.1. Messaging Overview

All communication is through a stream of messages. The first byte of a message identifies the message type, and the next four bytes specify the length of the rest of the message (this length count includes itself, but not the message-type byte). The remaining contents of the message are determined by the message type. For historical reasons, the very first message sent by the client (the startup message) has no initial message-type byte.

To avoid losing synchronization with the message stream, both servers and clients typically read an entire message into a buffer (using the byte count) before attempting to process its contents. This allows easy recovery if an error is detected while processing the contents. In extreme situations (such as not having enough memory to buffer the message), the receiver can use the byte count to determine how much input to skip before it resumes reading messages.

Conversely, both servers and clients must take care never to send an incomplete message. This is commonly done by marshaling the entire message in a buffer before beginning to send it. If a communications failure occurs partway through sending or receiving a message, the only sensible response is to abandon the connection, since there is little hope of recovering message-boundary synchronization.

9.4.1.2. Extended Query Overview

In the extended-query protocol, execution of SQL commands is divided into multiple steps. The state retained between steps is represented by two types of objects: prepared statements and portals. A prepared statement represents the result of parsing and semantic analysis of a textual query string. A prepared statement is not in itself ready to execute, because it might lack specific values for parameters. A portal represents a ready-to-execute or already-partially-executed statement, with any missing parameter values filled in. (For SELECT statements, a portal is equivalent to an open cursor, but we choose to use a different term since cursors don’t handle non-SELECT statements.)

The overall execution cycle consists of a parse step, which creates a prepared statement from a textual query string; a bind step, which creates a portal given a prepared statement and values for any needed parameters; and an execute step that runs a portal’s query. In the case of a query that returns rows (SELECT, SHOW, etc.), the execute step can be told to fetch only a limited number of rows, so that multiple execute steps might be needed to complete the operation.

The backend can keep track of multiple prepared statements and portals (but note that these exist only within a session, and are never shared across sessions). Existing prepared statements and portals are referenced by names assigned when they were created. In addition, an unnamed prepared statement and portal exist. Although these behave largely the same as named objects, operations on them are optimized for the case of executing a query only once and then discarding it, whereas operations on named objects are optimized on the expectation of multiple uses.

9.4.1.3. Formats and Format Codes

Data of a particular data type might be transmitted in any of several different formats. As of PostgreSQL 7.4 the only supported formats are text and binary, but the protocol makes provision for future extensions. The desired format for any value is specified by a format code. Clients can specify a format code for each transmitted parameter value and for each column of a query result. Text has format code zero, binary has format code one, and all other format codes are reserved for future definition.

The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type. In the transmitted representation, there is no trailing null character; the frontend must add one to received values if it wants to process them as C strings. (The text format does not allow embedded nulls, by the way.)

Binary representations for integers use network byte order (most significant byte first). For other data types consult the documentation or source code to learn about the binary representation. Keep in mind that binary representations for complex data types might change across server versions; the text format is usually the more portable choice.

9.4.2. Message Flow

This section describes the message flow and the semantics of each message type. (Details of the exact representation of each message appear in protocol-message-formats.) There are several different sub-protocols depending on the state of the connection: start-up, query, function call, COPY, and termination. There are also special provisions for asynchronous operations (including notification responses and command cancellation), which can occur at any time after the start-up phase.

9.4.2.1. Start-up

To begin a session, a frontend opens a connection to the server and sends a startup message. This message includes the names of the user and of the database the user wants to connect to; it also identifies the particular protocol version to be used. (Optionally, the startup message can include additional settings for run-time parameters.) The server then uses this information and the contents of its configuration files (such as pg_hba.conf) to determine whether the connection is provisionally acceptable, and what additional authentication is required (if any).

The server then sends an appropriate authentication request message, to which the frontend must reply with an appropriate authentication response message (such as a password). For all authentication methods except GSSAPI, SSPI and SASL, there is at most one request and one response. In some methods, no response at all is needed from the frontend, and so no authentication request occurs. For GSSAPI, SSPI and SASL, multiple exchanges of packets may be needed to complete the authentication.

The authentication cycle ends with the server either rejecting the connection attempt (ErrorResponse), or sending AuthenticationOk.

The possible messages from the server in this phase are:

  1. The connection attempt has been rejected. The server then immediately closes the connection.

  2. The authentication exchange is successfully completed.

  3. The frontend must now take part in a Kerberos V5 authentication dialog (not described here, part of the Kerberos specification) with the server. If this is successful, the server responds with an AuthenticationOk, otherwise it responds with an ErrorResponse. This is no longer supported.

  4. The frontend must now send a PasswordMessage containing the password in clear-text form. If this is the correct password, the server responds with an AuthenticationOk, otherwise it responds with an ErrorResponse.

  5. The frontend must now send a PasswordMessage containing the password (with user name) encrypted via MD5, then encrypted again using the 4-byte random salt specified in the AuthenticationMD5Password message. If this is the correct password, the server responds with an AuthenticationOk, otherwise it responds with an ErrorResponse. The actual PasswordMessage can be computed in SQL as concat(„md5“, md5(concat(md5(concat(password, username)), random-salt))). (Keep in mind the md5() function returns its result as a hex string.)

  6. This response is only possible for local Unix-domain connections on platforms that support SCM credential messages. The frontend must issue an SCM credential message and then send a single data byte. (The contents of the data byte are uninteresting; it’s only used to ensure that the server waits long enough to receive the credential message.) If the credential is acceptable, the server responds with an AuthenticationOk, otherwise it responds with an ErrorResponse. (This message type is only issued by pre-9.1 servers. It may eventually be removed from the protocol specification.)

  7. The frontend must now initiate a GSSAPI negotiation. The frontend will send a GSSResponse message with the first part of the GSSAPI data stream in response to this. If further messages are needed, the server will respond with AuthenticationGSSContinue.

  8. The frontend must now initiate an SSPI negotiation. The frontend will send a GSSResponse with the first part of the SSPI data stream in response to this. If further messages are needed, the server will respond with AuthenticationGSSContinue.

  9. This message contains the response data from the previous step of GSSAPI or SSPI negotiation (AuthenticationGSS, AuthenticationSSPI or a previous AuthenticationGSSContinue). If the GSSAPI or SSPI data in this message indicates more data is needed to complete the authentication, the frontend must send that data as another GSSResponse message. If GSSAPI or SSPI authentication is completed by this message, the server will next send AuthenticationOk to indicate successful authentication or ErrorResponse to indicate failure.

  10. The frontend must now initiate a SASL negotiation, using one of the SASL mechanisms listed in the message. The frontend will send a SASLInitialResponse with the name of the selected mechanism, and the first part of the SASL data stream in response to this. If further messages are needed, the server will respond with AuthenticationSASLContinue. See sasl-authentication for details.

  11. This message contains challenge data from the previous step of SASL negotiation (AuthenticationSASL, or a previous AuthenticationSASLContinue). The frontend must respond with a SASLResponse message.

  12. SASL authentication has completed with additional mechanism-specific data for the client. The server will next send AuthenticationOk to indicate successful authentication, or an ErrorResponse to indicate failure. This message is sent only if the SASL mechanism specifies additional data to be sent from server to client at completion.

  13. The server does not support the minor protocol version requested by the client, but does support an earlier version of the protocol; this message indicates the highest supported minor version. This message will also be sent if the client requested unsupported protocol options (i.e., beginning with _pq_.) in the startup packet. This message will be followed by an ErrorResponse or a message indicating the success or failure of authentication.

If the frontend does not support the authentication method requested by the server, then it should immediately close the connection.

After having received AuthenticationOk, the frontend must wait for further messages from the server. In this phase a backend process is being started, and the frontend is just an interested bystander. It is still possible for the startup attempt to fail (ErrorResponse) or the server to decline support for the requested minor protocol version (NegotiateProtocolVersion), but in the normal case the backend will send some ParameterStatus messages, BackendKeyData, and finally ReadyForQuery.

During this phase the backend will attempt to apply any additional run-time parameter settings that were given in the startup message. If successful, these values become session defaults. An error causes ErrorResponse and exit.

The possible messages from the backend in this phase are:

  1. This message provides secret-key data that the frontend must save if it wants to be able to issue cancel requests later. The frontend should not respond to this message, but should continue listening for a ReadyForQuery message.

  2. This message informs the frontend about the current (initial) setting of backend parameters, such as guc-client-encoding or guc-datestyle. The frontend can ignore this message, or record the settings for its future use; see protocol-async for more details. The frontend should not respond to this message, but should continue listening for a ReadyForQuery message.

  3. Start-up is completed. The frontend can now issue commands.

  4. Start-up failed. The connection is closed after sending this message.

  5. A warning message has been issued. The frontend should display the message but continue listening for ReadyForQuery or ErrorResponse.

The ReadyForQuery message is the same one that the backend will issue after each command cycle. Depending on the coding needs of the frontend, it is reasonable to consider ReadyForQuery as starting a command cycle, or to consider ReadyForQuery as ending the start-up phase and each subsequent command cycle.

9.4.2.2. Simple Query

A simple query cycle is initiated by the frontend sending a Query message to the backend. The message includes an SQL command (or commands) expressed as a text string. The backend then sends one or more response messages depending on the contents of the query command string, and finally a ReadyForQuery response message. ReadyForQuery informs the frontend that it can safely send a new command. (It is not actually necessary for the frontend to wait for ReadyForQuery before issuing another command, but the frontend must then take responsibility for figuring out what happens if the earlier command fails and already-issued later commands succeed.)

The possible response messages from the backend are:

  1. An SQL command completed normally.

  2. The backend is ready to copy data from the frontend to a table; see protocol-copy.

  3. The backend is ready to copy data from a table to the frontend; see protocol-copy.

  4. Indicates that rows are about to be returned in response to a SELECT, FETCH, etc. query. The contents of this message describe the column layout of the rows. This will be followed by a DataRow message for each row being returned to the frontend.

  5. One of the set of rows returned by a SELECT, FETCH, etc. query.

  6. An empty query string was recognized.

  7. An error has occurred.

  8. Processing of the query string is complete. A separate message is sent to indicate this because the query string might contain multiple SQL commands. (CommandComplete marks the end of processing one SQL command, not the whole string.) ReadyForQuery will always be sent, whether processing terminates successfully or with an error.

  9. A warning message has been issued in relation to the query. Notices are in addition to other responses, i.e., the backend will continue processing the command.

The response to a SELECT query (or other queries that return row sets, such as EXPLAIN or SHOW) normally consists of RowDescription, zero or more DataRow messages, and then CommandComplete. COPY to or from the frontend invokes special protocol as described in protocol-copy. All other query types normally produce only a CommandComplete message.

Since a query string could contain several queries (separated by semicolons), there might be several such response sequences before the backend finishes processing the query string. ReadyForQuery is issued when the entire string has been processed and the backend is ready to accept a new query string.

If a completely empty (no contents other than whitespace) query string is received, the response is EmptyQueryResponse followed by ReadyForQuery.

In the event of an error, ErrorResponse is issued followed by ReadyForQuery. All further processing of the query string is aborted by ErrorResponse (even if more queries remained in it). Note that this might occur partway through the sequence of messages generated by an individual query.

In simple Query mode, the format of retrieved values is always text, except when the given command is a FETCH from a cursor declared with the BINARY option. In that case, the retrieved values are in binary format. The format codes given in the RowDescription message tell which format is being used.

A frontend must be prepared to accept ErrorResponse and NoticeResponse messages whenever it is expecting any other type of message. See also protocol-async concerning messages that the backend might generate due to outside events.

Recommended practice is to code frontends in a state-machine style that will accept any message type at any time that it could make sense, rather than wiring in assumptions about the exact sequence of messages.

9.4.2.2.1. Multiple Statements in a Simple Query

When a simple Query message contains more than one SQL statement (separated by semicolons), those statements are executed as a single transaction, unless explicit transaction control commands are included to force a different behavior. For example, if the message contains

INSERT INTO mytable VALUES(1);
SELECT 1/0;
INSERT INTO mytable VALUES(2);
   then the divide-by-zero failure in the **SELECT** will force

rollback of the first INSERT. Furthermore, because execution of the message is abandoned at the first error, the second INSERT is never attempted at all.

If instead the message contains

BEGIN;
INSERT INTO mytable VALUES(1);
COMMIT;
INSERT INTO mytable VALUES(2);
SELECT 1/0;
   then the first **INSERT** is committed by the

explicit COMMIT command. The second INSERT and the SELECT are still treated as a single transaction, so that the divide-by-zero failure will roll back the second INSERT, but not the first one.

This behavior is implemented by running the statements in a multi-statement Query message in an implicit transaction block unless there is some explicit transaction block for them to run in. The main difference between an implicit transaction block and a regular one is that an implicit block is closed automatically at the end of the Query message, either by an implicit commit if there was no error, or an implicit rollback if there was an error. This is similar to the implicit commit or rollback that happens for a statement executed by itself (when not in a transaction block).

If the session is already in a transaction block, as a result of a BEGIN in some previous message, then the Query message simply continues that transaction block, whether the message contains one statement or several. However, if the Query message contains a COMMIT or ROLLBACK closing the existing transaction block, then any following statements are executed in an implicit transaction block. Conversely, if a BEGIN appears in a multi-statement Query message, then it starts a regular transaction block that will only be terminated by an explicit COMMIT or ROLLBACK, whether that appears in this Query message or a later one. If the BEGIN follows some statements that were executed as an implicit transaction block, those statements are not immediately committed; in effect, they are retroactively included into the new regular transaction block.

A COMMIT or ROLLBACK appearing in an implicit transaction block is executed as normal, closing the implicit block; however, a warning will be issued since a COMMIT or ROLLBACK without a previous BEGIN might represent a mistake. If more statements follow, a new implicit transaction block will be started for them.

Savepoints are not allowed in an implicit transaction block, since they would conflict with the behavior of automatically closing the block upon any error.

Remember that, regardless of any transaction control commands that may be present, execution of the Query message stops at the first error. Thus for example given

BEGIN;
SELECT 1/0;
ROLLBACK;
   in a single Query message, the session will be left inside a failed

regular transaction block, since the ROLLBACK is not reached after the divide-by-zero error. Another ROLLBACK will be needed to restore the session to a usable state.

Another behavior of note is that initial lexical and syntactic analysis is done on the entire query string before any of it is executed. Thus simple errors (such as a misspelled keyword) in later statements can prevent execution of any of the statements. This is normally invisible to users since the statements would all roll back anyway when done as an implicit transaction block. However, it can be visible when attempting to do multiple transactions within a multi-statement Query. For instance, if a typo turned our previous example into

BEGIN;
INSERT INTO mytable VALUES(1);
COMMIT;
INSERT INTO mytable VALUES(2);
SELCT 1/0;
   then none of the statements would get run, resulting in the visible

difference that the first INSERT is not committed. Errors detected at semantic analysis or later, such as a misspelled table or column name, do not have this effect.

9.4.2.3. Extended Query

The extended query protocol breaks down the above-described simple query protocol into multiple steps. The results of preparatory steps can be re-used multiple times for improved efficiency. Furthermore, additional features are available, such as the possibility of supplying data values as separate parameters instead of having to insert them directly into a query string.

In the extended protocol, the frontend first sends a Parse message, which contains a textual query string, optionally some information about data types of parameter placeholders, and the name of a destination prepared-statement object (an empty string selects the unnamed prepared statement). The response is either ParseComplete or ErrorResponse. Parameter data types can be specified by OID; if not given, the parser attempts to infer the data types in the same way as it would do for untyped literal string constants.

Примечание

A parameter data type can be left unspecified by setting it to zero, or by making the array of parameter type OIDs shorter than the number of parameter symbols ($**n) used in the query string. Another special case is that a parameter’s type can be specified as **void (that is, the OID of the void pseudo-type). This is meant to allow parameter symbols to be used for function parameters that are actually OUT parameters. Ordinarily there is no context in which a void parameter could be used, but if such a parameter symbol appears in a function’s parameter list, it is effectively ignored. For example, a function call such as foo($1,$2,$3,$4) could match a function with two IN and two OUT arguments, if $3 and $4 are specified as having type void.

Примечание

The query string contained in a Parse message cannot include more than one SQL statement; else a syntax error is reported. This restriction does not exist in the simple-query protocol, but it does exist in the extended protocol, because allowing prepared statements or portals to contain multiple commands would complicate the protocol unduly.

If successfully created, a named prepared-statement object lasts till the end of the current session, unless explicitly destroyed. An unnamed prepared statement lasts only until the next Parse statement specifying the unnamed statement as destination is issued. (Note that a simple Query message also destroys the unnamed statement.) Named prepared statements must be explicitly closed before they can be redefined by another Parse message, but this is not required for the unnamed statement. Named prepared statements can also be created and accessed at the SQL command level, using PREPARE and EXECUTE.

Once a prepared statement exists, it can be readied for execution using a Bind message. The Bind message gives the name of the source prepared statement (empty string denotes the unnamed prepared statement), the name of the destination portal (empty string denotes the unnamed portal), and the values to use for any parameter placeholders present in the prepared statement. The supplied parameter set must match those needed by the prepared statement. (If you declared any void parameters in the Parse message, pass NULL values for them in the Bind message.) Bind also specifies the format to use for any data returned by the query; the format can be specified overall, or per-column. The response is either BindComplete or ErrorResponse.

Примечание

The choice between text and binary output is determined by the format codes given in Bind, regardless of the SQL command involved. The BINARY attribute in cursor declarations is irrelevant when using extended query protocol.

Query planning typically occurs when the Bind message is processed. If the prepared statement has no parameters, or is executed repeatedly, the server might save the created plan and re-use it during subsequent Bind messages for the same prepared statement. However, it will do so only if it finds that a generic plan can be created that is not much less efficient than a plan that depends on the specific parameter values supplied. This happens transparently so far as the protocol is concerned.

If successfully created, a named portal object lasts till the end of the current transaction, unless explicitly destroyed. An unnamed portal is destroyed at the end of the transaction, or as soon as the next Bind statement specifying the unnamed portal as destination is issued. (Note that a simple Query message also destroys the unnamed portal.) Named portals must be explicitly closed before they can be redefined by another Bind message, but this is not required for the unnamed portal. Named portals can also be created and accessed at the SQL command level, using DECLARE CURSOR and FETCH.

Once a portal exists, it can be executed using an Execute message. The Execute message specifies the portal name (empty string denotes the unnamed portal) and a maximum result-row count (zero meaning fetch all rows). The result-row count is only meaningful for portals containing commands that return row sets; in other cases the command is always executed to completion, and the row count is ignored. The possible responses to Execute are the same as those described above for queries issued via simple query protocol, except that Execute doesn’t cause ReadyForQuery or RowDescription to be issued.

If Execute terminates before completing the execution of a portal (due to reaching a nonzero result-row count), it will send a PortalSuspended message; the appearance of this message tells the frontend that another Execute should be issued against the same portal to complete the operation. The CommandComplete message indicating completion of the source SQL command is not sent until the portal’s execution is completed. Therefore, an Execute phase is always terminated by the appearance of exactly one of these messages: CommandComplete, EmptyQueryResponse (if the portal was created from an empty query string), ErrorResponse, or PortalSuspended.

At completion of each series of extended-query messages, the frontend should issue a Sync message. This parameterless message causes the backend to close the current transaction if it’s not inside a BEGIN/COMMIT transaction block (close meaning to commit if no error, or roll back if error). Then a ReadyForQuery response is issued. The purpose of Sync is to provide a resynchronization point for error recovery. When an error is detected while processing any extended-query message, the backend issues ErrorResponse, then reads and discards messages until a Sync is reached, then issues ReadyForQuery and returns to normal message processing. (But note that no skipping occurs if an error is detected while processing Sync — this ensures that there is one and only one ReadyForQuery sent for each Sync.)

Примечание

Sync does not cause a transaction block opened with BEGIN to be closed. It is possible to detect this situation since the ReadyForQuery message includes transaction status information.

In addition to these fundamental, required operations, there are several optional operations that can be used with extended-query protocol.

The Describe message (portal variant) specifies the name of an existing portal (or an empty string for the unnamed portal). The response is a RowDescription message describing the rows that will be returned by executing the portal; or a NoData message if the portal does not contain a query that will return rows; or ErrorResponse if there is no such portal.

The Describe message (statement variant) specifies the name of an existing prepared statement (or an empty string for the unnamed prepared statement). The response is a ParameterDescription message describing the parameters needed by the statement, followed by a RowDescription message describing the rows that will be returned when the statement is eventually executed (or a NoData message if the statement will not return rows). ErrorResponse is issued if there is no such prepared statement. Note that since Bind has not yet been issued, the formats to be used for returned columns are not yet known to the backend; the format code fields in the RowDescription message will be zeroes in this case.

Совет

In most scenarios the frontend should issue one or the other variant of Describe before issuing Execute, to ensure that it knows how to interpret the results it will get back.

The Close message closes an existing prepared statement or portal and releases resources. It is not an error to issue Close against a nonexistent statement or portal name. The response is normally CloseComplete, but could be ErrorResponse if some difficulty is encountered while releasing resources. Note that closing a prepared statement implicitly closes any open portals that were constructed from that statement.

The Flush message does not cause any specific output to be generated, but forces the backend to deliver any data pending in its output buffers. A Flush must be sent after any extended-query command except Sync, if the frontend wishes to examine the results of that command before issuing more commands. Without Flush, messages returned by the backend will be combined into the minimum possible number of packets to minimize network overhead.

Примечание

The simple Query message is approximately equivalent to the series Parse, Bind, portal Describe, Execute, Close, Sync, using the unnamed prepared statement and portal objects and no parameters. One difference is that it will accept multiple SQL statements in the query string, automatically performing the bind/describe/execute sequence for each one in succession. Another difference is that it will not return ParseComplete, BindComplete, CloseComplete, or NoData messages.

9.4.2.4. Pipelining

Use of the extended query protocol allows pipelining, which means sending a series of queries without waiting for earlier ones to complete. This reduces the number of network round trips needed to complete a given series of operations. However, the user must carefully consider the required behavior if one of the steps fails, since later queries will already be in flight to the server.

One way to deal with that is to make the whole query series be a single transaction, that is wrap it in BEGINCOMMIT. However, this does not help if one wishes for some of the commands to commit independently of others.

The extended query protocol provides another way to manage this concern, which is to omit sending Sync messages between steps that are dependent. Since, after an error, the backend will skip command messages until it finds Sync, this allows later commands in a pipeline to be skipped automatically when an earlier one fails, without the client having to manage that explicitly with BEGIN and COMMIT. Independently-committable segments of the pipeline can be separated by Sync messages.

If the client has not issued an explicit BEGIN, then each Sync ordinarily causes an implicit COMMIT if the preceding step(s) succeeded, or an implicit ROLLBACK if they failed. However, there are a few DDL commands (such as CREATE DATABASE) that cannot be executed inside a transaction block. If one of these is executed in a pipeline, it will, upon success, force an immediate commit to preserve database consistency. A Sync immediately following one of these has no effect except to respond with ReadyForQuery.

When using this method, completion of the pipeline must be determined by counting ReadyForQuery messages and waiting for that to reach the number of Syncs sent. Counting command completion responses is unreliable, since some of the commands may not be executed and thus not produce a completion message.

9.4.2.5. Function Call

The Function Call sub-protocol allows the client to request a direct call of any function that exists in the database’s pg_proc system catalog. The client must have execute permission for the function.

Примечание

The Function Call sub-protocol is a legacy feature that is probably best avoided in new code. Similar results can be accomplished by setting up a prepared statement that does SELECT function($1, …). The Function Call cycle can then be replaced with Bind/Execute.

A Function Call cycle is initiated by the frontend sending a FunctionCall message to the backend. The backend then sends one or more response messages depending on the results of the function call, and finally a ReadyForQuery response message. ReadyForQuery informs the frontend that it can safely send a new query or function call.

The possible response messages from the backend are:

  1. An error has occurred.

  2. The function call was completed and returned the result given in the message. (Note that the Function Call protocol can only handle a single scalar result, not a row type or set of results.)

  3. Processing of the function call is complete. ReadyForQuery will always be sent, whether processing terminates successfully or with an error.

  4. A warning message has been issued in relation to the function call. Notices are in addition to other responses, i.e., the backend will continue processing the command.

9.4.2.6. COPY Operations

The COPY command allows high-speed bulk data transfer to or from the server. Copy-in and copy-out operations each switch the connection into a distinct sub-protocol, which lasts until the operation is completed.

Copy-in mode (data transfer to the server) is initiated when the backend executes a COPY FROM STDIN SQL statement. The backend sends a CopyInResponse message to the frontend. The frontend should then send zero or more CopyData messages, forming a stream of input data. (The message boundaries are not required to have anything to do with row boundaries, although that is often a reasonable choice.) The frontend can terminate the copy-in mode by sending either a CopyDone message (allowing successful termination) or a CopyFail message (which will cause the COPY SQL statement to fail with an error). The backend then reverts to the command-processing mode it was in before the COPY started, which will be either simple or extended query protocol. It will next send either CommandComplete (if successful) or ErrorResponse (if not).

In the event of a backend-detected error during copy-in mode (including receipt of a CopyFail message), the backend will issue an ErrorResponse message. If the COPY command was issued via an extended-query message, the backend will now discard frontend messages until a Sync message is received, then it will issue ReadyForQuery and return to normal processing. If the COPY command was issued in a simple Query message, the rest of that message is discarded and ReadyForQuery is issued. In either case, any subsequent CopyData, CopyDone, or CopyFail messages issued by the frontend will simply be dropped.

The backend will ignore Flush and Sync messages received during copy-in mode. Receipt of any other non-copy message type constitutes an error that will abort the copy-in state as described above. (The exception for Flush and Sync is for the convenience of client libraries that always send Flush or Sync after an Execute message, without checking whether the command to be executed is a COPY FROM STDIN.)

Copy-out mode (data transfer from the server) is initiated when the backend executes a COPY TO STDOUT SQL statement. The backend sends a CopyOutResponse message to the frontend, followed by zero or more CopyData messages (always one per row), followed by CopyDone. The backend then reverts to the command-processing mode it was in before the COPY started, and sends CommandComplete. The frontend cannot abort the transfer (except by closing the connection or issuing a Cancel request), but it can discard unwanted CopyData and CopyDone messages.

In the event of a backend-detected error during copy-out mode, the backend will issue an ErrorResponse message and revert to normal processing. The frontend should treat receipt of ErrorResponse as terminating the copy-out mode.

It is possible for NoticeResponse and ParameterStatus messages to be interspersed between CopyData messages; frontends must handle these cases, and should be prepared for other asynchronous message types as well (see protocol-async). Otherwise, any message type other than CopyData or CopyDone may be treated as terminating copy-out mode.

There is another Copy-related mode called copy-both, which allows high-speed bulk data transfer to and from the server. Copy-both mode is initiated when a backend in walsender mode executes a START_REPLICATION statement. The backend sends a CopyBothResponse message to the frontend. Both the backend and the frontend may then send CopyData messages until either end sends a CopyDone message. After the client sends a CopyDone message, the connection goes from copy-both mode to copy-out mode, and the client may not send any more CopyData messages. Similarly, when the server sends a CopyDone message, the connection goes into copy-in mode, and the server may not send any more CopyData messages. After both sides have sent a CopyDone message, the copy mode is terminated, and the backend reverts to the command-processing mode. In the event of a backend-detected error during copy-both mode, the backend will issue an ErrorResponse message, discard frontend messages until a Sync message is received, and then issue ReadyForQuery and return to normal processing. The frontend should treat receipt of ErrorResponse as terminating the copy in both directions; no CopyDone should be sent in this case. See protocol-replication for more information on the subprotocol transmitted over copy-both mode.

The CopyInResponse, CopyOutResponse and CopyBothResponse messages include fields that inform the frontend of the number of columns per row and the format codes being used for each column. (As of the present implementation, all columns in a given COPY operation will use the same format, but the message design does not assume this.)

9.4.2.7. Asynchronous Operations

There are several cases in which the backend will send messages that are not specifically prompted by the frontend’s command stream. Frontends must be prepared to deal with these messages at any time, even when not engaged in a query. At minimum, one should check for these cases before beginning to read a query response.

It is possible for NoticeResponse messages to be generated due to outside activity; for example, if the database administrator commands a fast database shutdown, the backend will send a NoticeResponse indicating this fact before closing the connection. Accordingly, frontends should always be prepared to accept and display NoticeResponse messages, even when the connection is nominally idle.

ParameterStatus messages will be generated whenever the active value changes for any of the parameters the backend believes the frontend should know about. Most commonly this occurs in response to a SET SQL command executed by the frontend, and this case is effectively synchronous — but it is also possible for parameter status changes to occur because the administrator changed a configuration file and then sent the SIGHUP signal to the server. Also, if a SET command is rolled back, an appropriate ParameterStatus message will be generated to report the current effective value.

At present there is a hard-wired set of parameters for which ParameterStatus will be generated: they are server_version, server_encoding, client_encoding, application_name, default_transaction_read_only, in_hot_standby, is_superuser, session_authorization, DateStyle, IntervalStyle, TimeZone, integer_datetimes, and standard_conforming_strings. (server_encoding, TimeZone, and integer_datetimes were not reported by releases before 8.0; standard_conforming_strings was not reported by releases before 8.1; IntervalStyle was not reported by releases before 8.4; application_name was not reported by releases before 9.0; default_transaction_read_only and in_hot_standby were not reported by releases before 14.) Note that server_version, server_encoding and integer_datetimes are pseudo-parameters that cannot change after startup. This set might change in the future, or even become configurable. Accordingly, a frontend should simply ignore ParameterStatus for parameters that it does not understand or care about.

If a frontend issues a LISTEN command, then the backend will send a NotificationResponse message (not to be confused with NoticeResponse!) whenever a NOTIFY command is executed for the same channel name.

Примечание

At present, NotificationResponse can only be sent outside a transaction, and thus it will not occur in the middle of a command-response series, though it might occur just before ReadyForQuery. It is unwise to design frontend logic that assumes that, however. Good practice is to be able to accept NotificationResponse at any point in the protocol.

9.4.2.8. Canceling Requests in Progress

During the processing of a query, the frontend might request cancellation of the query. The cancel request is not sent directly on the open connection to the backend for reasons of implementation efficiency: we don’t want to have the backend constantly checking for new input from the frontend during query processing. Cancel requests should be relatively infrequent, so we make them slightly cumbersome in order to avoid a penalty in the normal case.

To issue a cancel request, the frontend opens a new connection to the server and sends a CancelRequest message, rather than the StartupMessage message that would ordinarily be sent across a new connection. The server will process this request and then close the connection. For security reasons, no direct reply is made to the cancel request message.

A CancelRequest message will be ignored unless it contains the same key data (PID and secret key) passed to the frontend during connection start-up. If the request matches the PID and secret key for a currently executing backend, the processing of the current query is aborted. (In the existing implementation, this is done by sending a special signal to the backend process that is processing the query.)

The cancellation signal might or might not have any effect — for example, if it arrives after the backend has finished processing the query, then it will have no effect. If the cancellation is effective, it results in the current command being terminated early with an error message.

The upshot of all this is that for reasons of both security and efficiency, the frontend has no direct way to tell whether a cancel request has succeeded. It must continue to wait for the backend to respond to the query. Issuing a cancel simply improves the odds that the current query will finish soon, and improves the odds that it will fail with an error message instead of succeeding.

Since the cancel request is sent across a new connection to the server and not across the regular frontend/backend communication link, it is possible for the cancel request to be issued by any process, not just the frontend whose query is to be canceled. This might provide additional flexibility when building multiple-process applications. It also introduces a security risk, in that unauthorized persons might try to cancel queries. The security risk is addressed by requiring a dynamically generated secret key to be supplied in cancel requests.

9.4.2.9. Termination

The normal, graceful termination procedure is that the frontend sends a Terminate message and immediately closes the connection. On receipt of this message, the backend closes the connection and terminates.

In rare cases (such as an administrator-commanded database shutdown) the backend might disconnect without any frontend request to do so. In such cases the backend will attempt to send an error or notice message giving the reason for the disconnection before it closes the connection.

Other termination scenarios arise from various failure cases, such as core dump at one end or the other, loss of the communications link, loss of message-boundary synchronization, etc. If either frontend or backend sees an unexpected closure of the connection, it should clean up and terminate. The frontend has the option of launching a new backend by recontacting the server if it doesn’t want to terminate itself. Closing the connection is also advisable if an unrecognizable message type is received, since this probably indicates loss of message-boundary sync.

For either normal or abnormal termination, any open transaction is rolled back, not committed. One should note however that if a frontend disconnects while a non-SELECT query is being processed, the backend will probably finish the query before noticing the disconnection. If the query is outside any transaction block (BEGINCOMMIT sequence) then its results might be committed before the disconnection is recognized.

9.4.2.10. SSL Session Encryption

If PostgreSQL was built with SSL support, frontend/backend communications can be encrypted using SSL. This provides communication security in environments where attackers might be able to capture the session traffic. For more information on encrypting PostgreSQL sessions with SSL, see ssl-tcp.

To initiate an SSL-encrypted connection, the frontend initially sends an SSLRequest message rather than a StartupMessage. The server then responds with a single byte containing S or N, indicating that it is willing or unwilling to perform SSL, respectively. The frontend might close the connection at this point if it is dissatisfied with the response. To continue after S, perform an SSL startup handshake (not described here, part of the SSL specification) with the server. If this is successful, continue with sending the usual StartupMessage. In this case the StartupMessage and all subsequent data will be SSL-encrypted. To continue after N, send the usual StartupMessage and proceed without encryption. (Alternatively, it is permissible to issue a GSSENCRequest message after an N response to try to use GSSAPI encryption instead of SSL.)

The frontend should also be prepared to handle an ErrorMessage response to SSLRequest from the server. This would only occur if the server predates the addition of SSL support to PostgreSQL. (Such servers are now very ancient, and likely do not exist in the wild anymore.) In this case the connection must be closed, but the frontend might choose to open a fresh connection and proceed without requesting SSL.

When SSL encryption can be performed, the server is expected to send only the single S byte and then wait for the frontend to initiate an SSL handshake. If additional bytes are available to read at this point, it likely means that a man-in-the-middle is attempting to perform a buffer-stuffing attack (CVE-2021-23222). Frontends should be coded either to read exactly one byte from the socket before turning the socket over to their SSL library, or to treat it as a protocol violation if they find they have read additional bytes.

An initial SSLRequest can also be used in a connection that is being opened to send a CancelRequest message.

While the protocol itself does not provide a way for the server to force SSL encryption, the administrator can configure the server to reject unencrypted sessions as a byproduct of authentication checking.

9.4.2.11. GSSAPI Session Encryption

If PostgreSQL was built with GSSAPI support, frontend/backend communications can be encrypted using GSSAPI. This provides communication security in environments where attackers might be able to capture the session traffic. For more information on encrypting PostgreSQL sessions with GSSAPI, see gssapi-enc.

To initiate a GSSAPI-encrypted connection, the frontend initially sends a GSSENCRequest message rather than a StartupMessage. The server then responds with a single byte containing G or N, indicating that it is willing or unwilling to perform GSSAPI encryption, respectively. The frontend might close the connection at this point if it is dissatisfied with the response. To continue after G, using the GSSAPI C bindings as discussed in RFC 2744 or equivalent, perform a GSSAPI initialization by calling gss_init_sec_context() in a loop and sending the result to the server, starting with an empty input and then with each result from the server, until it returns no output. When sending the results of gss_init_sec_context() to the server, prepend the length of the message as a four byte integer in network byte order. To continue after N, send the usual StartupMessage and proceed without encryption. (Alternatively, it is permissible to issue an SSLRequest message after an N response to try to use SSL encryption instead of GSSAPI.)

The frontend should also be prepared to handle an ErrorMessage response to GSSENCRequest from the server. This would only occur if the server predates the addition of GSSAPI encryption support to PostgreSQL. In this case the connection must be closed, but the frontend might choose to open a fresh connection and proceed without requesting GSSAPI encryption.

When GSSAPI encryption can be performed, the server is expected to send only the single G byte and then wait for the frontend to initiate a GSSAPI handshake. If additional bytes are available to read at this point, it likely means that a man-in-the-middle is attempting to perform a buffer-stuffing attack (CVE-2021-23222). Frontends should be coded either to read exactly one byte from the socket before turning the socket over to their GSSAPI library, or to treat it as a protocol violation if they find they have read additional bytes.

An initial GSSENCRequest can also be used in a connection that is being opened to send a CancelRequest message.

Once GSSAPI encryption has been successfully established, use gss_wrap() to encrypt the usual StartupMessage and all subsequent data, prepending the length of the result from gss_wrap() as a four byte integer in network byte order to the actual encrypted payload. Note that the server will only accept encrypted packets from the client which are less than 16kB; gss_wrap_size_limit() should be used by the client to determine the size of the unencrypted message which will fit within this limit and larger messages should be broken up into multiple gss_wrap() calls. Typical segments are 8kB of unencrypted data, resulting in encrypted packets of slightly larger than 8kB but well within the 16kB maximum. The server can be expected to not send encrypted packets of larger than 16kB to the client.

While the protocol itself does not provide a way for the server to force GSSAPI encryption, the administrator can configure the server to reject unencrypted sessions as a byproduct of authentication checking.

9.4.3. SASL Authentication

SASL is a framework for authentication in connection-oriented protocols. At the moment, PostgreSQL implements two SASL authentication mechanisms, SCRAM-SHA-256 and SCRAM-SHA-256-PLUS. More might be added in the future. The below steps illustrate how SASL authentication is performed in general, while the next subsection gives more details on SCRAM-SHA-256 and SCRAM-SHA-256-PLUS.

SASL Authentication Message Flow

To begin a SASL authentication exchange, the server sends an AuthenticationSASL message. It includes a list of SASL authentication mechanisms that the server can accept, in the server’s preferred order.

The client selects one of the supported mechanisms from the list, and sends a SASLInitialResponse message to the server. The message includes the name of the selected mechanism, and an optional Initial Client Response, if the selected mechanism uses that.

One or more server-challenge and client-response message will follow. Each server-challenge is sent in an AuthenticationSASLContinue message, followed by a response from client in a SASLResponse message. The particulars of the messages are mechanism specific.

Finally, when the authentication exchange is completed successfully, the server sends an AuthenticationSASLFinal message, followed immediately by an AuthenticationOk message. The AuthenticationSASLFinal contains additional server-to-client data, whose content is particular to the selected authentication mechanism. If the authentication mechanism doesn’t use additional data that’s sent at completion, the AuthenticationSASLFinal message is not sent.

On error, the server can abort the authentication at any stage, and send an ErrorMessage.

9.4.3.1. SCRAM-SHA-256 Authentication

The implemented SASL mechanisms at the moment are SCRAM-SHA-256 and its variant with channel binding SCRAM-SHA-256-PLUS. They are described in detail in RFC 7677 and RFC 5802.

When SCRAM-SHA-256 is used in PostgreSQL, the server will ignore the user name that the client sends in the client-first-message. The user name that was already sent in the startup message is used instead. PostgreSQL supports multiple character encodings, while SCRAM dictates UTF-8 to be used for the user name, so it might be impossible to represent the PostgreSQL user name in UTF-8.

The SCRAM specification dictates that the password is also in UTF-8, and is processed with the SASLprep algorithm. PostgreSQL, however, does not require UTF-8 to be used for the password. When a user’s password is set, it is processed with SASLprep as if it was in UTF-8, regardless of the actual encoding used. However, if it is not a legal UTF-8 byte sequence, or it contains UTF-8 byte sequences that are prohibited by the SASLprep algorithm, the raw password will be used without SASLprep processing, instead of throwing an error. This allows the password to be normalized when it is in UTF-8, but still allows a non-UTF-8 password to be used, and doesn’t require the system to know which encoding the password is in.

Channel binding is supported in PostgreSQL builds with SSL support. The SASL mechanism name for SCRAM with channel binding is SCRAM-SHA-256-PLUS. The channel binding type used by PostgreSQL is tls-server-end-point.

In SCRAM without channel binding, the server chooses a random number that is transmitted to the client to be mixed with the user-supplied password in the transmitted password hash. While this prevents the password hash from being successfully retransmitted in a later session, it does not prevent a fake server between the real server and client from passing through the server’s random value and successfully authenticating.

SCRAM with channel binding prevents such man-in-the-middle attacks by mixing the signature of the server’s certificate into the transmitted password hash. While a fake server can retransmit the real server’s certificate, it doesn’t have access to the private key matching that certificate, and therefore cannot prove it is the owner, causing SSL connection failure.

Example

The server sends an AuthenticationSASL message. It includes a list of SASL authentication mechanisms that the server can accept. This will be SCRAM-SHA-256-PLUS and SCRAM-SHA-256 if the server is built with SSL support, or else just the latter.

The client responds by sending a SASLInitialResponse message, which indicates the chosen mechanism, SCRAM-SHA-256 or SCRAM-SHA-256-PLUS. (A client is free to choose either mechanism, but for better security it should choose the channel-binding variant if it can support it.) In the Initial Client response field, the message contains the SCRAM client-first-message. The client-first-message also contains the channel binding type chosen by the client.

Server sends an AuthenticationSASLContinue message, with a SCRAM server-first-message as the content.

Client sends a SASLResponse message, with SCRAM client-final-message as the content.

Server sends an AuthenticationSASLFinal message, with the SCRAM server-final-message, followed immediately by an AuthenticationOk message.

9.4.4. Streaming Replication Protocol

To initiate streaming replication, the frontend sends the replication parameter in the startup message. A Boolean value of true (or on, yes, 1) tells the backend to go into physical replication walsender mode, wherein a small set of replication commands, shown below, can be issued instead of SQL statements.

Passing database as the value for the replication parameter instructs the backend to go into logical replication walsender mode, connecting to the database specified in the dbname parameter. In logical replication walsender mode, the replication commands shown below as well as normal SQL commands can be issued.

In either physical replication or logical replication walsender mode, only the simple query protocol can be used.

For the purpose of testing replication commands, you can make a replication connection via psql or any other libpq-using tool with a connection string including the replication option, e.g.:

psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
 However, it is often more useful to use

app-pgreceivewal (for physical replication) or app-pgrecvlogical (for logical replication).

Replication commands are logged in the server log when guc-log-replication-commands is enabled.

The commands accepted in replication mode are:

  1. Requests the server to identify itself. Server replies with a result set of a single row, containing four fields:

    1. The unique system identifier identifying the cluster. This can be used to check that the base backup used to initialize the standby came from the same cluster.

    2. Current timeline ID. Also useful to check that the standby is consistent with the primary.

    3. Current WAL flush location. Useful to get a known location in the write-ahead log where streaming can start.

    4. Database connected to or null.

  2. Requests the server to send the current setting of a run-time parameter. This is similar to the SQL command sql-show.

    1. The name of a run-time parameter. Available parameters are documented in runtime-config.

  3. Requests the server to send over the timeline history file for timeline tli. Server replies with a result set of a single row, containing two fields. While the fields are labeled as text, they effectively return raw bytes, with no encoding conversion:

    1. File name of the timeline history file, e.g., 00000002.history.

    2. Contents of the timeline history file.

  4. Create a physical or logical replication slot. See streaming-replication-slots for more about replication slots.

    1. The name of the slot to create. Must be a valid replication slot name (see streaming-replication-slots-manipulation).

    2. The name of the output plugin used for logical decoding (see logicaldecoding-output-plugin).

    3. Specify that this replication slot is a temporary one. Temporary slots are not saved to disk and are automatically dropped on error or when the session has finished.

    The following options are supported:

    1. If true, this logical replication slot supports decoding of two-phase commit. With this option, commands related to two-phase commit such as PREPARE TRANSACTION, COMMIT PREPARED and ROLLBACK PREPARED are decoded and transmitted. The transaction will be decoded and transmitted at PREPARE TRANSACTION time. The default is false.

    2. If true, this physical replication slot reserves WAL immediately. Otherwise, WAL is only reserved upon connection from a streaming replication client. The default is false.

    3. Decides what to do with the snapshot created during logical slot initialization. „export“, which is the default, will export the snapshot for use in other sessions. This option can’t be used inside a transaction. „use“ will use the snapshot for the current transaction executing the command. This option must be used in a transaction, and CREATE_REPLICATION_SLOT must be the first command run in that transaction. Finally, „nothing“ will just use the snapshot for logical decoding as normal but won’t do anything else with it.

    In response to this command, the server will send a one-row result set containing the following fields:

    1. The name of the newly-created replication slot.

    2. The WAL location at which the slot became consistent. This is the earliest location from which streaming can start on this replication slot.

    3. The identifier of the snapshot exported by the command. The snapshot is valid until a new command is executed on this connection or the replication connection is closed. Null if the created slot is physical.

    4. The name of the output plugin used by the newly-created replication slot. Null if the created slot is physical.

  5. For compatibility with older releases, this alternative syntax for the CREATE_REPLICATION_SLOT command is still supported.

  6. Read some information associated with a replication slot. Returns a tuple with NULL values if the replication slot does not exist. This command is currently only supported for physical replication slots.

    In response to this command, the server will return a one-row result set, containing the following fields:

    1. The replication slot’s type, either physical or NULL.

    2. The replication slot’s restart_lsn.

    3. The timeline ID associated with restart_lsn, following the current timeline history.

  7. Instructs server to start streaming WAL, starting at WAL location XXX/XXX. If TIMELINE option is specified, streaming starts on timeline tli; otherwise, the server’s current timeline is selected. The server can reply with an error, for example if the requested section of WAL has already been recycled. On success, the server responds with a CopyBothResponse message, and then starts to stream WAL to the frontend.

    If a slot’s name is provided via slot_name, it will be updated as replication progresses so that the server knows which WAL segments, and if hot_standby_feedback is on which transactions, are still needed by the standby.

    If the client requests a timeline that’s not the latest but is part of the history of the server, the server will stream all the WAL on that timeline starting from the requested start point up to the point where the server switched to another timeline. If the client requests streaming at exactly the end of an old timeline, the server skips COPY mode entirely.

    After streaming all the WAL on a timeline that is not the latest one, the server will end streaming by exiting the COPY mode. When the client acknowledges this by also exiting COPY mode, the server sends a result set with one row and two columns, indicating the next timeline in this server’s history. The first column is the next timeline’s ID (type int8), and the second column is the WAL location where the switch happened (type text). Usually, the switch position is the end of the WAL that was streamed, but there are corner cases where the server can send some WAL from the old timeline that it has not itself replayed before promoting. Finally, the server sends two CommandComplete messages (one that ends the CopyData and the other ends the START_REPLICATION itself), and is ready to accept a new command.

    WAL data is sent as a series of CopyData messages. (This allows other information to be intermixed; in particular the server can send an ErrorResponse message if it encounters a failure after beginning to stream.) The payload of each CopyData message from server to the client contains a message of one of the following formats:

      1. Identifies the message as WAL data.

      2. The starting point of the WAL data in this message.

      3. The current end of WAL on the server.

      4. The server’s system clock at the time of transmission, as microseconds since midnight on 2000-01-01.

      5. A section of the WAL data stream.

        A single WAL record is never split across two XLogData messages. When a WAL record crosses a WAL page boundary, and is therefore already split using continuation records, it can be split at the page boundary. In other words, the first main WAL record and its continuation records can be sent in different XLogData messages.

      1. Identifies the message as a sender keepalive.

      2. The current end of WAL on the server.

      3. The server’s system clock at the time of transmission, as microseconds since midnight on 2000-01-01.

      4. 1 means that the client should reply to this message as soon as possible, to avoid a timeout disconnect. 0 otherwise.

    The receiving process can send replies back to the sender at any time, using one of the following message formats (also in the payload of a CopyData message):

      1. Identifies the message as a receiver status update.

      2. The location of the last WAL byte + 1 received and written to disk in the standby.

      3. The location of the last WAL byte + 1 flushed to disk in the standby.

      4. The location of the last WAL byte + 1 applied in the standby.

      5. The client’s system clock at the time of transmission, as microseconds since midnight on 2000-01-01.

      6. If 1, the client requests the server to reply to this message immediately. This can be used to ping the server, to test if the connection is still healthy.

      1. Identifies the message as a hot standby feedback message.

      2. The client’s system clock at the time of transmission, as microseconds since midnight on 2000-01-01.

      3. The standby’s current global xmin, excluding the catalog_xmin from any replication slots. If both this value and the following catalog_xmin are 0 this is treated as a notification that hot standby feedback will no longer be sent on this connection. Later non-zero messages may reinitiate the feedback mechanism.

      4. The epoch of the global xmin xid on the standby.

      5. The lowest catalog_xmin of any replication slots on the standby. Set to 0 if no catalog_xmin exists on the standby or if hot standby feedback is being disabled.

      6. The epoch of the catalog_xmin xid on the standby.

  8. Instructs server to start streaming WAL for logical replication, starting at either WAL location XXX/XXX or the slot’s confirmed_flush_lsn (see view-pg-replication-slots), whichever is greater. This behavior makes it easier for clients to avoid updating their local LSN status when there is no data to process. However, starting at a different LSN than requested might not catch certain kinds of client errors; so the client may wish to check that confirmed_flush_lsn matches its expectations before issuing START_REPLICATION.

    The server can reply with an error, for example if the slot does not exist. On success, the server responds with a CopyBothResponse message, and then starts to stream WAL to the frontend.

    The messages inside the CopyBothResponse messages are of the same format documented for START_REPLICATION … PHYSICAL, including two CommandComplete messages.

    The output plugin associated with the selected slot is used to process the output for streaming.

    1. The name of the slot to stream changes from. This parameter is required, and must correspond to an existing logical replication slot created with CREATE_REPLICATION_SLOT in LOGICAL mode.

    2. The WAL location to begin streaming at.

    3. The name of an option passed to the slot’s logical decoding plugin.

    4. Optional value, in the form of a string constant, associated with the specified option.

  9. Drops a replication slot, freeing any reserved server-side resources. If the slot is a logical slot that was created in a database other than the database the walsender is connected to, this command fails.

    1. The name of the slot to drop.

    2. This option causes the command to wait if the slot is active until it becomes inactive, instead of the default behavior of raising an error.

  10. Instructs the server to start streaming a base backup. The system will automatically be put in backup mode before the backup is started, and taken out of it when the backup is complete. The following options are accepted:

    1. Sets the label of the backup. If none is specified, a backup label of base backup will be used. The quoting rules for the label are the same as a standard SQL string with guc-standard-conforming-strings turned on.

    2. Tells the server where to send the backup. If the target is client, which is the default, the backup data is sent to the client. If it is server, the backup data is written to the server at the pathname specified by the TARGET_DETAIL option. If it is blackhole, the backup data is not sent anywhere; it is simply discarded.

      The server target requires superuser privilege or being granted the pg_write_server_files role.

    3. Provides additional information about the backup target.

      Currently, this option can only be used when the backup target is server. It specifies the server directory to which the backup should be written.

    4. If set to true, request information required to generate a progress report. This will send back an approximate size in the header of each tablespace, which can be used to calculate how far along the stream is done. This is calculated by enumerating all the file sizes once before the transfer is even started, and might as such have a negative impact on the performance. In particular, it might take longer before the first data is streamed. Since the database files can change during the backup, the size is only approximate and might both grow and shrink between the time of approximation and the sending of the actual files. The default is false.

    5. Sets the type of checkpoint to be performed at the beginning of the base backup. The default is spread.

    6. If set to true, include the necessary WAL segments in the backup. This will include all the files between start and stop backup in the pg_wal directory of the base directory tar file. The default is false.

    7. If set to true, the backup will wait until the last required WAL segment has been archived, or emit a warning if log archiving is not enabled. If false, the backup will neither wait nor warn, leaving the client responsible for ensuring the required log is available. The default is true.

    8. Instructs the server to compress the backup using the specified method. Currently, the supported methods are gzip, lz4, and zstd.

    9. Specifies details for the chosen compression method. This should only be used in conjunction with the COMPRESSION option. If the value is an integer, it specifies the compression level. Otherwise, it should be a comma-separated list of items, each of the form keyword or keyword=value. Currently, the supported keywords are level and workers.

      The level keyword sets the compression level. For gzip the compression level should be an integer between 1 and 9 (default Z_DEFAULT_COMPRESSION or -1), for lz4 an integer between 1 and 12 (default 0 for fast compression mode), and for zstd an integer between ZSTD_minCLevel() (usually -131072) and ZSTD_maxCLevel() (usually 22), (default ZSTD_CLEVEL_DEFAULT or 3).

      The workers keyword sets the number of threads that should be used for parallel compression. Parallel compression is supported only for zstd.

    10. Limit (throttle) the maximum amount of data transferred from server to client per unit of time. The expected unit is kilobytes per second. If this option is specified, the value must either be equal to zero or it must fall within the range from 32 kB through 1 GB (inclusive). If zero is passed or the option is not specified, no restriction is imposed on the transfer.

    11. If true, include information about symbolic links present in the directory pg_tblspc in a file named tablespace_map. The tablespace map file includes each symbolic link name as it exists in the directory pg_tblspc/ and the full path of that symbolic link. The default is false.

    12. If true, checksums are verified during a base backup if they are enabled. If false, this is skipped. The default is true.

    13. When this option is specified with a value of yes or force-encode, a backup manifest is created and sent along with the backup. The manifest is a list of every file present in the backup with the exception of any WAL files that may be included. It also stores the size, last modification time, and optionally a checksum for each file. A value of force-encode forces all filenames to be hex-encoded; otherwise, this type of encoding is performed only for files whose names are non-UTF8 octet sequences. force-encode is intended primarily for testing purposes, to be sure that clients which read the backup manifest can handle this case. For compatibility with previous releases, the default is MANIFEST „no“.

    14. Specifies the checksum algorithm that should be applied to each file included in the backup manifest. Currently, the available algorithms are NONE, CRC32C, SHA224, SHA256, SHA384, and SHA512. The default is CRC32C.

    When the backup is started, the server will first send two ordinary result sets, followed by one or more CopyOutResponse results.

    The first ordinary result set contains the starting position of the backup, in a single row with two columns. The first column contains the start position given in XLogRecPtr format, and the second column contains the corresponding timeline ID.

    The second ordinary result set has one row for each tablespace. The fields in this row are:

    1. The OID of the tablespace, or null if it’s the base directory.

    2. The full path of the tablespace directory, or null if it’s the base directory.

    3. The approximate size of the tablespace, in kilobytes (1024 bytes), if progress report has been requested; otherwise it’s null.

    After the second regular result set, a CopyOutResponse will be sent. The payload of each CopyData message will contain a message in one of the following formats:

      1. Identifies the message as indicating the start of a new archive. There will be one archive for the main data directory and one for each additional tablespace; each will use tar format (following the ustar interchange format specified in the POSIX 1003.1-2008 standard).

      2. The file name for this archive.

      3. For the main data directory, an empty string. For other tablespaces, the full path to the directory from which this archive was created.

      1. Identifies the message as indicating the start of the backup manifest.

      1. Identifies the message as containing archive or manifest data.

      2. Data bytes.

      1. Identifies the message as a progress report.

      2. The number of bytes from the current tablespace for which processing has been completed.

    After the CopyOutResponse, or all such responses, have been sent, a final ordinary result set will be sent, containing the WAL end position of the backup, in the same format as the start position.

    The tar archive for the data directory and each tablespace will contain all files in the directories, regardless of whether they are PostgreSQL files or other files added to the same directory. The only excluded files are:

    1. postmaster.pid

    2. postmaster.opts

    3. pg_internal.init (found in multiple directories)

    4. Various temporary files and directories created during the operation of the PostgreSQL server, such as any file or directory beginning with pgsql_tmp and temporary relations.

    5. Unlogged relations, except for the init fork which is required to recreate the (empty) unlogged relation on recovery.

    6. pg_wal, including subdirectories. If the backup is run with WAL files included, a synthesized version of pg_wal will be included, but it will only contain the files necessary for the backup to work, not the rest of the contents.

    7. pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots, pg_stat_tmp, and pg_subtrans are copied as empty directories (even if they are symbolic links).

    8. Files other than regular files and directories, such as symbolic links (other than for the directories listed above) and special device files, are skipped. (Symbolic links in pg_tblspc are maintained.)

      Owner, group, and file mode are set if the underlying file system on

    the server supports it.

9.4.5. Logical Streaming Replication Protocol

This section describes the logical replication protocol, which is the message flow started by the START_REPLICATION SLOT slot_name LOGICAL replication command.

The logical streaming replication protocol builds on the primitives of the physical streaming replication protocol.

9.4.5.1. Logical Streaming Replication Parameters

The logical replication START_REPLICATION command accepts following parameters:

  1. Protocol version. Currently versions 1, 2, and 3 are supported.

    Version 2 is supported only for server version 14 and above, and it allows streaming of large in-progress transactions.

    Version 3 is supported only for server version 15 and above, and it allows streaming of two-phase commits.

  2. Comma separated list of publication names for which to subscribe (receive changes). The individual publication names are treated as standard objects names and can be quoted the same as needed.

9.4.5.2. Logical Replication Protocol Messages

The individual protocol messages are discussed in the following subsections. Individual messages are described in protocol-logicalrep-message-formats.

All top-level protocol messages begin with a message type byte. While represented in code as a character, this is a signed byte with no associated encoding.

Since the streaming replication protocol supplies a message length there is no need for top-level protocol messages to embed a length in their header.

9.4.5.3. Logical Replication Protocol Message Flow

With the exception of the START_REPLICATION command and the replay progress messages, all information flows only from the backend to the frontend.

The logical replication protocol sends individual transactions one by one. This means that all messages between a pair of Begin and Commit messages belong to the same transaction. Similarly, all messages between a pair of Begin Prepare and Prepare messages belong to the same transaction. It also sends changes of large in-progress transactions between a pair of Stream Start and Stream Stop messages. The last stream of such a transaction contains a Stream Commit or Stream Abort message.

Every sent transaction contains zero or more DML messages (Insert, Update, Delete). In case of a cascaded setup it can also contain Origin messages. The origin message indicates that the transaction originated on different replication node. Since a replication node in the scope of logical replication protocol can be pretty much anything, the only identifier is the origin name. It’s downstream’s responsibility to handle this as needed (if needed). The Origin message is always sent before any DML messages in the transaction.

Every DML message contains a relation OID, identifying the publisher’s relation that was acted on. Before the first DML message for a given relation OID, a Relation message will be sent, describing the schema of that relation. Subsequently, a new Relation message will be sent if the relation’s definition has changed since the last Relation message was sent for it. (The protocol assumes that the client is capable of remembering this metadata for as many relations as needed.)

Relation messages identify column types by their OIDs. In the case of a built-in type, it is assumed that the client can look up that type OID locally, so no additional data is needed. For a non-built-in type OID, a Type message will be sent before the Relation message, to provide the type name associated with that OID. Thus, a client that needs to specifically identify the types of relation columns should cache the contents of Type messages, and first consult that cache to see if the type OID is defined there. If not, look up the type OID locally.

9.4.6. Message Data Types

This section describes the base data types used in messages.

  1. An n-bit integer in network byte order (most significant byte first). If i is specified it is the exact value that will appear, otherwise the value is variable. Eg. Int16, Int32(42).

  2. An array of k n-bit integers, each in network byte order. The array length k is always determined by an earlier field in the message. Eg. Int16[M].

  3. A null-terminated string (C-style string). There is no specific length limitation on strings. If s is specified it is the exact value that will appear, otherwise the value is variable. Eg. String, String(«user»).

    Примечание

    There is no predefined limit on the length of a string that can be returned by the backend. Good coding strategy for a frontend is to use an expandable buffer so that anything that fits in memory can be accepted. If that’s not feasible, read the full string and discard trailing characters that don’t fit into your fixed-size buffer.

  4. Exactly n bytes. If the field width n is not a constant, it is always determinable from an earlier field in the message. If c is specified it is the exact value. Eg. Byte2, Byte1(“n“).

9.4.7. Message Formats

This section describes the detailed format of each message. Each is marked to indicate that it can be sent by a frontend (F), a backend (B), or both (F & B). Notice that although each message includes a byte count at the beginning, the message format is defined so that the message end can be found without reference to the byte count. This aids validity checking. (The CopyData message is an exception, because it forms part of a data stream; the contents of any individual CopyData message cannot be interpretable on their own.)

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that the authentication was successful.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that Kerberos V5 authentication is required.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that a clear-text password is required.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that an MD5-encrypted password is required.

    4. The salt to use when encrypting the password.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that an SCM credentials message is required.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that GSSAPI authentication is required.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that this message contains GSSAPI or SSPI data.

    4. GSSAPI or SSPI authentication data.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that SSPI authentication is required.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that SASL authentication is required.

    The message body is a list of SASL authentication mechanisms, in the server’s order of preference. A zero byte is required as terminator after the last authentication mechanism name. For each mechanism, there is the following:

    1. Name of a SASL authentication mechanism.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that this message contains a SASL challenge.

    4. SASL data, specific to the SASL mechanism being used.

    1. Identifies the message as an authentication request.

    2. Length of message contents in bytes, including self.

    3. Specifies that SASL authentication has completed.

    4. SASL outcome «additional data», specific to the SASL mechanism being used.

    1. Identifies the message as cancellation key data. The frontend must save these values if it wishes to be able to issue CancelRequest messages later.

    2. Length of message contents in bytes, including self.

    3. The process ID of this backend.

    4. The secret key of this backend.

    1. Identifies the message as a Bind command.

    2. Length of message contents in bytes, including self.

    3. The name of the destination portal (an empty string selects the unnamed portal).

    4. The name of the source prepared statement (an empty string selects the unnamed prepared statement).

    5. The number of parameter format codes that follow (denoted C below). This can be zero to indicate that there are no parameters or that the parameters all use the default format (text); or one, in which case the specified format code is applied to all parameters; or it can equal the actual number of parameters.

    6. The parameter format codes. Each must presently be zero (text) or one (binary).

    7. The number of parameter values that follow (possibly zero). This must match the number of parameters needed by the query.

    Next, the following pair of fields appear for each parameter:

    1. The length of the parameter value, in bytes (this count does not include itself). Can be zero. As a special case, -1 indicates a NULL parameter value. No value bytes follow in the NULL case.

    2. The value of the parameter, in the format indicated by the associated format code. n is the above length.

    After the last parameter, the following fields appear:

    1. The number of result-column format codes that follow (denoted R below). This can be zero to indicate that there are no result columns or that the result columns should all use the default format (text); or one, in which case the specified format code is applied to all result columns (if any); or it can equal the actual number of result columns of the query.

    2. The result-column format codes. Each must presently be zero (text) or one (binary).

    1. Identifies the message as a Bind-complete indicator.

    2. Length of message contents in bytes, including self.

    1. Length of message contents in bytes, including self.

    2. The cancel request code. The value is chosen to contain 1234 in the most significant 16 bits, and 5678 in the least significant 16 bits. (To avoid confusion, this code must not be the same as any protocol version number.)

    3. The process ID of the target backend.

    4. The secret key for the target backend.

    1. Identifies the message as a Close command.

    2. Length of message contents in bytes, including self.

    3. S“ to close a prepared statement; or „P“ to close a portal.

    4. The name of the prepared statement or portal to close (an empty string selects the unnamed prepared statement or portal).

    1. Identifies the message as a Close-complete indicator.

    2. Length of message contents in bytes, including self.

    1. Identifies the message as a command-completed response.

    2. Length of message contents in bytes, including self.

    3. The command tag. This is usually a single word that identifies which SQL command was completed.

      For an INSERT command, the tag is INSERT oid rows, where rows is the number of rows inserted. oid used to be the object ID of the inserted row if rows was 1 and the target table had OIDs, but OIDs system columns are not supported anymore; therefore oid is always 0.

      For a DELETE command, the tag is DELETE rows where rows is the number of rows deleted.

      For an UPDATE command, the tag is UPDATE rows where rows is the number of rows updated.

      For a SELECT or CREATE TABLE AS command, the tag is SELECT rows where rows is the number of rows retrieved.

      For a MOVE command, the tag is MOVE rows where rows is the number of rows the cursor’s position has been changed by.

      For a FETCH command, the tag is FETCH rows where rows is the number of rows that have been retrieved from the cursor.

      For a COPY command, the tag is COPY rows where rows is the number of rows copied. (Note: the row count appears only in PostgreSQL 8.2 and later.)

    1. Identifies the message as COPY data.

    2. Length of message contents in bytes, including self.

    3. Data that forms part of a COPY data stream. Messages sent from the backend will always correspond to single data rows, but messages sent by frontends might divide the data stream arbitrarily.

    1. Identifies the message as a COPY-complete indicator.

    2. Length of message contents in bytes, including self.

    1. Identifies the message as a COPY-failure indicator.

    2. Length of message contents in bytes, including self.

    3. An error message to report as the cause of failure.

    1. Identifies the message as a Start Copy In response. The frontend must now send copy-in data (if not prepared to do so, send a CopyFail message).

    2. Length of message contents in bytes, including self.

    3. 0 indicates the overall COPY format is textual (rows separated by newlines, columns separated by separator characters, etc.). 1 indicates the overall copy format is binary (similar to DataRow format). See sql-copy for more information.

    4. The number of columns in the data to be copied (denoted N below).

    5. The format codes to be used for each column. Each must presently be zero (text) or one (binary). All must be zero if the overall copy format is textual.

    1. Identifies the message as a Start Copy Out response. This message will be followed by copy-out data.

    2. Length of message contents in bytes, including self.

    3. 0 indicates the overall COPY format is textual (rows separated by newlines, columns separated by separator characters, etc.). 1 indicates the overall copy format is binary (similar to DataRow format). See sql-copy for more information.

    4. The number of columns in the data to be copied (denoted N below).

    5. The format codes to be used for each column. Each must presently be zero (text) or one (binary). All must be zero if the overall copy format is textual.

    1. Identifies the message as a Start Copy Both response. This message is used only for Streaming Replication.

    2. Length of message contents in bytes, including self.

    3. 0 indicates the overall COPY format is textual (rows separated by newlines, columns separated by separator characters, etc.). 1 indicates the overall copy format is binary (similar to DataRow format). See sql-copy for more information.

    4. The number of columns in the data to be copied (denoted N below).

    5. The format codes to be used for each column. Each must presently be zero (text) or one (binary). All must be zero if the overall copy format is textual.

    1. Identifies the message as a data row.

    2. Length of message contents in bytes, including self.

    3. The number of column values that follow (possibly zero).

    Next, the following pair of fields appear for each column:

    1. The length of the column value, in bytes (this count does not include itself). Can be zero. As a special case, -1 indicates a NULL column value. No value bytes follow in the NULL case.

    2. The value of the column, in the format indicated by the associated format code. n is the above length.

    1. Identifies the message as a Describe command.

    2. Length of message contents in bytes, including self.

    3. S“ to describe a prepared statement; or „P“ to describe a portal.

    4. The name of the prepared statement or portal to describe (an empty string selects the unnamed prepared statement or portal).

    1. Identifies the message as a response to an empty query string. (This substitutes for CommandComplete.)

    2. Length of message contents in bytes, including self.

    1. Identifies the message as an error.

    2. Length of message contents in bytes, including self.

    The message body consists of one or more identified fields, followed by a zero byte as a terminator. Fields can appear in any order. For each field there is the following:

    1. A code identifying the field type; if zero, this is the message terminator and no string follows. The presently defined field types are listed in protocol-error-fields. Since more field types might be added in future, frontends should silently ignore fields of unrecognized type.

    2. The field value.

    1. Identifies the message as an Execute command.

    2. Length of message contents in bytes, including self.

    3. The name of the portal to execute (an empty string selects the unnamed portal).

    4. Maximum number of rows to return, if portal contains a query that returns rows (ignored otherwise). Zero denotes no limit.

    1. Identifies the message as a Flush command.

    2. Length of message contents in bytes, including self.

    1. Identifies the message as a function call.

    2. Length of message contents in bytes, including self.

    3. Specifies the object ID of the function to call.

    4. The number of argument format codes that follow (denoted C below). This can be zero to indicate that there are no arguments or that the arguments all use the default format (text); or one, in which case the specified format code is applied to all arguments; or it can equal the actual number of arguments.

    5. The argument format codes. Each must presently be zero (text) or one (binary).

    6. Specifies the number of arguments being supplied to the function.

    Next, the following pair of fields appear for each argument:

    1. The length of the argument value, in bytes (this count does not include itself). Can be zero. As a special case, -1 indicates a NULL argument value. No value bytes follow in the NULL case.

    2. The value of the argument, in the format indicated by the associated format code. n is the above length.

    After the last argument, the following field appears:

    1. The format code for the function result. Must presently be zero (text) or one (binary).

    1. Identifies the message as a function call result.

    2. Length of message contents in bytes, including self.

    3. The length of the function result value, in bytes (this count does not include itself). Can be zero. As a special case, -1 indicates a NULL function result. No value bytes follow in the NULL case.

    4. The value of the function result, in the format indicated by the associated format code. n is the above length.

    1. Length of message contents in bytes, including self.

    2. The GSSAPI Encryption request code. The value is chosen to contain 1234 in the most significant 16 bits, and 5680 in the least significant 16 bits. (To avoid confusion, this code must not be the same as any protocol version number.)

    1. Identifies the message as a GSSAPI or SSPI response. Note that this is also used for SASL and password response messages. The exact message type can be deduced from the context.

    2. Length of message contents in bytes, including self.

    3. GSSAPI/SSPI specific message data.

    1. Identifies the message as a protocol version negotiation message.

    2. Length of message contents in bytes, including self.

    3. Newest minor protocol version supported by the server for the major protocol version requested by the client.

    4. Number of protocol options not recognized by the server.

    Then, for protocol option not recognized by the server, there is the following:

    1. The option name.

    1. Identifies the message as a no-data indicator.

    2. Length of message contents in bytes, including self.

    1. Identifies the message as a notice.

    2. Length of message contents in bytes, including self.

    The message body consists of one or more identified fields, followed by a zero byte as a terminator. Fields can appear in any order. For each field there is the following:

    1. A code identifying the field type; if zero, this is the message terminator and no string follows. The presently defined field types are listed in protocol-error-fields. Since more field types might be added in future, frontends should silently ignore fields of unrecognized type.

    2. The field value.

    1. Identifies the message as a notification response.

    2. Length of message contents in bytes, including self.

    3. The process ID of the notifying backend process.

    4. The name of the channel that the notify has been raised on.

    5. The payload string passed from the notifying process.

    1. Identifies the message as a parameter description.

    2. Length of message contents in bytes, including self.

    3. The number of parameters used by the statement (can be zero).

    Then, for each parameter, there is the following:

    1. Specifies the object ID of the parameter data type.

    1. Identifies the message as a run-time parameter status report.

    2. Length of message contents in bytes, including self.

    3. The name of the run-time parameter being reported.

    4. The current value of the parameter.

    1. Identifies the message as a Parse command.

    2. Length of message contents in bytes, including self.

    3. The name of the destination prepared statement (an empty string selects the unnamed prepared statement).

    4. The query string to be parsed.

    5. The number of parameter data types specified (can be zero). Note that this is not an indication of the number of parameters that might appear in the query string, only the number that the frontend wants to prespecify types for.

    Then, for each parameter, there is the following:

    1. Specifies the object ID of the parameter data type. Placing a zero here is equivalent to leaving the type unspecified.

    1. Identifies the message as a Parse-complete indicator.

    2. Length of message contents in bytes, including self.

    1. Identifies the message as a password response. Note that this is also used for GSSAPI, SSPI and SASL response messages. The exact message type can be deduced from the context.

    2. Length of message contents in bytes, including self.

    3. The password (encrypted, if requested).

    1. Identifies the message as a portal-suspended indicator. Note this only appears if an Execute message’s row-count limit was reached.

    2. Length of message contents in bytes, including self.

    1. Identifies the message as a simple query.

    2. Length of message contents in bytes, including self.

    3. The query string itself.

    1. Identifies the message type. ReadyForQuery is sent whenever the backend is ready for a new query cycle.

    2. Length of message contents in bytes, including self.

    3. Current backend transaction status indicator. Possible values are „I“ if idle (not in a transaction block); „T“ if in a transaction block; or „E“ if in a failed transaction block (queries will be rejected until block is ended).

    1. Identifies the message as a row description.

    2. Length of message contents in bytes, including self.

    3. Specifies the number of fields in a row (can be zero).

    Then, for each field, there is the following:

    1. The field name.

    2. If the field can be identified as a column of a specific table, the object ID of the table; otherwise zero.

    3. If the field can be identified as a column of a specific table, the attribute number of the column; otherwise zero.

    4. The object ID of the field’s data type.

    5. The data type size (see pg_type.typlen). Note that negative values denote variable-width types.

    6. The type modifier (see pg_attribute.atttypmod). The meaning of the modifier is type-specific.

    7. The format code being used for the field. Currently will be zero (text) or one (binary). In a RowDescription returned from the statement variant of Describe, the format code is not yet known and will always be zero.

    1. Identifies the message as an initial SASL response. Note that this is also used for GSSAPI, SSPI and password response messages. The exact message type is deduced from the context.

    2. Length of message contents in bytes, including self.

    3. Name of the SASL authentication mechanism that the client selected.

    4. Length of SASL mechanism specific «Initial Client Response» that follows, or -1 if there is no Initial Response.

    5. SASL mechanism specific «Initial Response».

    1. Identifies the message as a SASL response. Note that this is also used for GSSAPI, SSPI and password response messages. The exact message type can be deduced from the context.

    2. Length of message contents in bytes, including self.

    3. SASL mechanism specific message data.

    1. Length of message contents in bytes, including self.

    2. The SSL request code. The value is chosen to contain 1234 in the most significant 16 bits, and 5679 in the least significant 16 bits. (To avoid confusion, this code must not be the same as any protocol version number.)

    1. Length of message contents in bytes, including self.

    2. The protocol version number. The most significant 16 bits are the major version number (3 for the protocol described here). The least significant 16 bits are the minor version number (0 for the protocol described here).

    The protocol version number is followed by one or more pairs of parameter name and value strings. A zero byte is required as a terminator after the last name/value pair. Parameters can appear in any order. user is required, others are optional. Each parameter is specified as:

    1. The parameter name. Currently recognized names are:

      1. The database user name to connect as. Required; there is no default.

      2. The database to connect to. Defaults to the user name.

      3. Command-line arguments for the backend. (This is deprecated in favor of setting individual run-time parameters.) Spaces within this string are considered to separate arguments, unless escaped with a backslash (**); write **\ to represent a literal backslash.

      4. Used to connect in streaming replication mode, where a small set of replication commands can be issued instead of SQL statements. Value can be true, false, or database, and the default is false. See protocol-replication for details.

      In addition to the above, other parameters may be listed. Parameter names beginning with _pq_. are reserved for use as protocol extensions, while others are treated as run-time parameters to be set at backend start time. Such settings will be applied during backend start (after parsing the command-line arguments if any) and will act as session defaults.

    2. The parameter value.

    1. Identifies the message as a Sync command.

    2. Length of message contents in bytes, including self.

    1. Identifies the message as a termination.

    2. Length of message contents in bytes, including self.

9.4.8. Error and Notice Message Fields

This section describes the fields that can appear in ErrorResponse and NoticeResponse messages. Each field type has a single-byte identification token. Note that any given field type should appear at most once per message.

  1. Severity: the field contents are ERROR, FATAL, or PANIC (in an error message), or WARNING, NOTICE, DEBUG, INFO, or LOG (in a notice message), or a localized translation of one of these. Always present.

  2. Severity: the field contents are ERROR, FATAL, or PANIC (in an error message), or WARNING, NOTICE, DEBUG, INFO, or LOG (in a notice message). This is identical to the S field except that the contents are never localized. This is present only in messages generated by PostgreSQL versions 9.6 and later.

  3. Code: the SQLSTATE code for the error (see errcodes-appendix). Not localizable. Always present.

  4. Message: the primary human-readable error message. This should be accurate but terse (typically one line). Always present.

  5. Detail: an optional secondary error message carrying more detail about the problem. Might run to multiple lines.

  6. Hint: an optional suggestion what to do about the problem. This is intended to differ from Detail in that it offers advice (potentially inappropriate) rather than hard facts. Might run to multiple lines.

  7. Position: the field value is a decimal ASCII integer, indicating an error cursor position as an index into the original query string. The first character has index 1, and positions are measured in characters not bytes.

  8. Internal position: this is defined the same as the P field, but it is used when the cursor position refers to an internally generated command rather than the one submitted by the client. The q field will always appear when this field appears.

  9. Internal query: the text of a failed internally-generated command. This could be, for example, an SQL query issued by a PL/pgSQL function.

  10. Where: an indication of the context in which the error occurred. Presently this includes a call stack traceback of active procedural language functions and internally-generated queries. The trace is one entry per line, most recent first.

  11. Schema name: if the error was associated with a specific database object, the name of the schema containing that object, if any.

  12. Table name: if the error was associated with a specific table, the name of the table. (Refer to the schema name field for the name of the table’s schema.)

  13. Column name: if the error was associated with a specific table column, the name of the column. (Refer to the schema and table name fields to identify the table.)

  14. Data type name: if the error was associated with a specific data type, the name of the data type. (Refer to the schema name field for the name of the data type’s schema.)

  15. Constraint name: if the error was associated with a specific constraint, the name of the constraint. Refer to fields listed above for the associated table or domain. (For this purpose, indexes are treated as constraints, even if they weren’t created with constraint syntax.)

  16. File: the file name of the source-code location where the error was reported.

  17. Line: the line number of the source-code location where the error was reported.

  18. Routine: the name of the source-code routine reporting the error.

Примечание

The fields for schema name, table name, column name, data type name, and constraint name are supplied only for a limited number of error types; see errcodes-appendix. Frontends should not assume that the presence of any of these fields guarantees the presence of another field. Core error sources observe the interrelationships noted above, but user-defined functions may use these fields in other ways. In the same vein, clients should not assume that these fields denote contemporary objects in the current database.

The client is responsible for formatting displayed information to meet its needs; in particular it should break long lines as needed. Newline characters appearing in the error message fields should be treated as paragraph breaks, not line breaks.

9.4.9. Logical Replication Message Formats

This section describes the detailed format of each logical replication message. These messages are either returned by the replication slot SQL interface or are sent by a walsender. In the case of a walsender, they are encapsulated inside replication protocol WAL messages as described in protocol-replication, and generally obey the same message flow as physical replication.

    1. Identifies the message as a begin message.

    2. The final LSN of the transaction.

    3. Commit timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    4. Xid of the transaction.

    1. Identifies the message as a logical decoding message.

    2. Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

    3. Flags; Either 0 for no flags or 1 if the logical decoding message is transactional.

    4. The LSN of the logical decoding message.

    5. The prefix of the logical decoding message.

    6. Length of the content.

    7. The content of the logical decoding message.

    1. Identifies the message as a commit message.

    2. Flags; currently unused.

    3. The LSN of the commit.

    4. The end LSN of the transaction.

    5. Commit timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    1. Identifies the message as an origin message.

    2. The LSN of the commit on the origin server.

    3. Name of the origin.

    Note that there can be multiple Origin messages inside a single transaction.

    1. Identifies the message as a relation message.

    2. Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

    3. OID of the relation.

    4. Namespace (empty string for pg_catalog).

    5. Relation name.

    6. Replica identity setting for the relation (same as relreplident in pg_class).

    7. Number of columns.

    Next, the following message part appears for each column included in the publication (except generated columns):

    1. Flags for the column. Currently can be either 0 for no flags or 1 which marks the column as part of the key.

    2. Name of the column.

    3. OID of the column’s data type.

    4. Type modifier of the column (atttypmod).

    1. Identifies the message as a type message.

    2. Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

    3. OID of the data type.

    4. Namespace (empty string for pg_catalog).

    5. Name of the data type.

    1. Identifies the message as an insert message.

    2. Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

    3. OID of the relation corresponding to the ID in the relation message.

    4. Identifies the following TupleData message as a new tuple.

    5. TupleData message part representing the contents of new tuple.

    1. Identifies the message as an update message.

    2. Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

    3. OID of the relation corresponding to the ID in the relation message.

    4. Identifies the following TupleData submessage as a key. This field is optional and is only present if the update changed data in any of the column(s) that are part of the REPLICA IDENTITY index.

    5. Identifies the following TupleData submessage as an old tuple. This field is optional and is only present if table in which the update happened has REPLICA IDENTITY set to FULL.

    6. TupleData message part representing the contents of the old tuple or primary key. Only present if the previous „O“ or „K“ part is present.

    7. Identifies the following TupleData message as a new tuple.

    8. TupleData message part representing the contents of a new tuple.

    The Update message may contain either a „K“ message part or an „O“ message part or neither of them, but never both of them.

    1. Identifies the message as a delete message.

    2. Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

    3. OID of the relation corresponding to the ID in the relation message.

    4. Identifies the following TupleData submessage as a key. This field is present if the table in which the delete has happened uses an index as REPLICA IDENTITY.

    5. Identifies the following TupleData message as an old tuple. This field is present if the table in which the delete happened has REPLICA IDENTITY set to FULL.

    6. TupleData message part representing the contents of the old tuple or primary key, depending on the previous field.

    The Delete message may contain either a „K“ message part or an „O“ message part, but never both of them.

    1. Identifies the message as a truncate message.

    2. Xid of the transaction (only present for streamed transactions). This field is available since protocol version 2.

    3. Number of relations

    4. Option bits for TRUNCATE: 1 for CASCADE, 2 for RESTART IDENTITY

    5. OID of the relation corresponding to the ID in the relation message. This field is repeated for each relation.

The following messages (Stream Start, Stream Stop, Stream Commit, and Stream Abort) are available since protocol version 2.

    1. Identifies the message as a stream start message.

    2. Xid of the transaction.

    3. A value of 1 indicates this is the first stream segment for this XID, 0 for any other stream segment.

    1. Identifies the message as a stream stop message.

    1. Identifies the message as a stream commit message.

    2. Xid of the transaction.

    3. Flags; currently unused.

    4. The LSN of the commit.

    5. The end LSN of the transaction.

    6. Commit timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    1. Identifies the message as a stream abort message.

    2. Xid of the transaction.

    3. Xid of the subtransaction (will be same as xid of the transaction for top-level transactions).

The following messages (Begin Prepare, Prepare, Commit Prepared, Rollback Prepared, Stream Prepare) are available since protocol version 3.

    1. Identifies the message as the beginning of a prepared transaction message.

    2. The LSN of the prepare.

    3. The end LSN of the prepared transaction.

    4. Prepare timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    5. Xid of the transaction.

    6. The user defined GID of the prepared transaction.

    1. Identifies the message as a prepared transaction message.

    2. Flags; currently unused.

    3. The LSN of the prepare.

    4. The end LSN of the prepared transaction.

    5. Prepare timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    6. Xid of the transaction.

    7. The user defined GID of the prepared transaction.

    1. Identifies the message as the commit of a prepared transaction message.

    2. Flags; currently unused.

    3. The LSN of the commit of the prepared transaction.

    4. The end LSN of the commit of the prepared transaction.

    5. Commit timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    6. Xid of the transaction.

    7. The user defined GID of the prepared transaction.

    1. Identifies the message as the rollback of a prepared transaction message.

    2. Flags; currently unused.

    3. The end LSN of the prepared transaction.

    4. The end LSN of the rollback of the prepared transaction.

    5. Prepare timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    6. Rollback timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    7. Xid of the transaction.

    8. The user defined GID of the prepared transaction.

    1. Identifies the message as a stream prepared transaction message.

    2. Flags; currently unused.

    3. The LSN of the prepare.

    4. The end LSN of the prepared transaction.

    5. Prepare timestamp of the transaction. The value is in number of microseconds since PostgreSQL epoch (2000-01-01).

    6. Xid of the transaction.

    7. The user defined GID of the prepared transaction.

The following message parts are shared by the above messages.

    1. Number of columns.

    Next, one of the following submessages appears for each column (except generated columns):

    1. Identifies the data as NULL value.

    Or

    1. Identifies unchanged TOASTed value (the actual value is not sent).

    Or

    1. Identifies the data as text formatted value.

    Or

    1. Identifies the data as binary formatted value.

    2. Length of the column value.

    3. The value of the column, either in binary or in text format. (As specified in the preceding format byte). n is the above length.

9.4.10. Summary of Changes since Protocol 2.0

This section provides a quick checklist of changes, for the benefit of developers trying to update existing client libraries to protocol 3.0.

The initial startup packet uses a flexible list-of-strings format instead of a fixed format. Notice that session default values for run-time parameters can now be specified directly in the startup packet. (Actually, you could do that before using the options field, but given the limited width of options and the lack of any way to quote whitespace in the values, it wasn’t a very safe technique.)

All messages now have a length count immediately following the message type byte (except for startup packets, which have no type byte). Also note that PasswordMessage now has a type byte.

ErrorResponse and NoticeResponse (“E“ and „N“) messages now contain multiple fields, from which the client code can assemble an error message of the desired level of verbosity. Note that individual fields will typically not end with a newline, whereas the single string sent in the older protocol always did.

The ReadyForQuery (“Z“) message includes a transaction status indicator.

The distinction between BinaryRow and DataRow message types is gone; the single DataRow message type serves for returning data in all formats. Note that the layout of DataRow has changed to make it easier to parse. Also, the representation of binary values has changed: it is no longer directly tied to the server’s internal representation.

There is a new extended query sub-protocol, which adds the frontend message types Parse, Bind, Execute, Describe, Close, Flush, and Sync, and the backend message types ParseComplete, BindComplete, PortalSuspended, ParameterDescription, NoData, and CloseComplete. Existing clients do not have to concern themselves with this sub-protocol, but making use of it might allow improvements in performance or functionality.

COPY data is now encapsulated into CopyData and CopyDone messages. There is a well-defined way to recover from errors during COPY. The special . last line is not needed anymore, and is not sent during COPY OUT. (It is still recognized as a terminator during COPY IN, but its use is deprecated and will eventually be removed.) Binary COPY is supported. The CopyInResponse and CopyOutResponse messages include fields indicating the number of columns and the format of each column.

The layout of FunctionCall and FunctionCallResponse messages has changed. FunctionCall can now support passing NULL arguments to functions. It also can handle passing parameters and retrieving results in either text or binary format. There is no longer any reason to consider FunctionCall a potential security hole, since it does not offer direct access to internal server data representations.

The backend sends ParameterStatus (“S“) messages during connection startup for all parameters it considers interesting to the client library. Subsequently, a ParameterStatus message is sent whenever the active value changes for any of these parameters.

The RowDescription (“T“) message carries new table OID and column number fields for each column of the described row. It also shows the format code for each column.

The CursorResponse (“P“) message is no longer generated by the backend.

The NotificationResponse (“A“) message has an additional string field, which can carry a payload string passed from the NOTIFY event sender.

The EmptyQueryResponse (“I“) message used to include an empty string parameter; this has been removed.