F.11. Citus Optimized Row Columnar (ORC)#

F.11. Citus Optimized Row Columnar (ORC)

F.11. Citus Optimized Row Columnar (ORC)

F.11.1. About Citus Columnar

Version: 12.1.5

Project page www.citusdata.com

GitHub

Copyright © Citus Data, Inc.

F.11.2. Into

In traditional analytical (OLAP) systems, data size and volume tend to increase over time, potentially leading to various performance challenges. Columnar storage offers considerable advantages, especially in scenarios where database management systems (DBMSs) are employed for analytical tasks that predominantly involve batch data loading.

Columnar storage allows for substantial table compression, greatly reducing resource usage while maintaining high performance. This is achievable without resorting to distributed MPP (Massive Parallel Processing) DBMS systems. The inherent structure of ORC in Columnar storage facilitates efficient disk reads and offers superior data compression, enhancing overall performance.

Note

While the Columnar storage approach doesn't entirely supplant distributed MPP (Massive Parallel Processing) DBMSs in analytics, it can eliminate the need for them in certain situations. The decision to transition to an MPP DBMS depends on specific data volumes and workload demands. Although this threshold varies, it is generally in the realm of tens of terabytes.

F.11.3. Columnar Installation

The Columnar storage method is set as an extension. To add Columnar to your local Tantor SE database, connect with psql and run:

CREATE EXTENSION pg_columnar;

Now you can create and use Columnar tables in your database.

F.11.4. Columnar basic principles

Tantor SE standard heap data storage method works well for an OLTP workload:

  • Support UPDATE/DELETE operations,

  • Efficient single-tuple lookups

Columnar tables are best suited for analytic or data warehouse workloads, where the following benefits will be useful compared to the heap method:

  • Compression

  • Doesn't read unnecessary columns

  • Efficient VACUUM.

F.11.5. Usage of the Table Access Method Functionality

Advantages of Columnar over cstore_fdw:

  • Columnar storage is built upon the Table Access Method API. This API enables similar operations on columnar tables as on traditional row-oriented (heap) tables for most functionalities.

  • Supports Write Ahead Log (WAL);

  • Supports ROLLBACK transactions;

  • Supports physical replication;

  • Supports recovery, including point-in-time recovery (PITR);

  • Supports pg_dump and pg_upgrade without the need for special actions or additional steps;

  • Supports most of the functionality that works with regular heap tables (rows).

Restrictions

  • Append-only (UPDATE/DELETE operations are not supported);

  • Does not free up space (for example, rolled-back transactions may still take up disk space);

  • Bitmap index scans are not supported;

  • Not supported by tidscans;

  • Sample scans are not supported;

  • TOAST storage method is not supported (large values are stored internally);

  • ON CONFLICT is not supported (other than DO NOTHING without specifying a target);

  • Row-level locks are not supported (SELECT ... FOR SHARE, SELECT ... FOR UPDATE);

  • The serializable isolation level is not supported;

  • Versions of Tantor SE 14+ are supported;

  • Foreign keys, unique constraints, or exclusion constraints are not supported.

  • Logic decoding is not supported;

  • Parallel scans inside a node are not supported;

  • AFTER ... FOR EACH ROW triggers are not supported;

  • UNLOGGED columnar tables are not supported.

Over time, the restrictions listed above will be gradually lifted.

F.11.6. Columnar Usage

Creating a columnar table, specifying the USING columnar option:

CREATE TABLE my_columnar_table
(
    id INT,
    i1 INT,
    i2 INT8,
    n NUMERIC,
    t TEXT
) USING columnar;

Insert the data into the table and read it as usual (subject to the restrictions listed above).

To view internal statistics for a table, use VACUUM VERBOSE. Note that VACUUM (without FULL) is much faster on a columnar table because it only scans the metadata and not the actual data.

Set table options using ALTER TABLE:

ALTER TABLE my_columnar_table SET(
  columnar.compression = 'none',
  columnar.stripe_row_limit = 10000
);

For tables following options are available:

  • columnar.compression: ['none'|'pglz'|'zstd'|'lz4'|'lz4hc'] - set compression type for newly inserted data. Existing data will not be recompressed/decompressed. The default value is 'zstd' (if support was specified when compiling).

  • columnar.compression_level: <integer> - sets the compression level. Valid values are 1 to 19. If the compression method does not support the selected level, the closest level will be selected instead. The default value is 3.

  • columnar.stripe_row_limit: <integer> - the maximum number of rows in a stripe for newly inserted data. Existing data stripes will not be modified and may contain more rows than this maximum value. Valid values are 1000 to 100000000. The default value is 150000.

  • columnar.chunk_group_row_limit: <integer> - maximum number of rows in a chunk for newly inserted data. Existing data chunks will not be modified and may have more rows than this maximum value. Valid values are 1000 to 100000000. The default value is 10000.

You can view the options of all tables using:

SELECT * from columnar.options;

Or a specific table:

SELECT * FROM columnar.options
WHERE relation = 'my_columnar_table'::regclass;

You can reset one or more table options to their default (or current values using SET) using columnar.alter_columnar_table_reset:

SELECT columnar.alter_columnar_table_reset(
  'my_columnar_table',
  chunk_group_row_limit => true
);

Set general columnar options using SET:

SET columnar.compression TO 'none';
SET columnar.enable_qual_pushdown TO false;
SET columnar.compression TO default;
  • columnar.enable_custom_scan: <boolean> - enables the use of a custom scan to push projections and quals into the storage layer. The default value is true.

  • columnar.enable_qual_pushdown: <boolean> - enables qual pushdown into columnar. This has no effect unless columnar.enable_custom_scan is true. The default value is true.

  • columnar.qual_pushdown_correlation_threshold: <real> - correlation threshold to attempt to push a qual referencing the given column. A value of 0 means attempt to push down all quals, even if the column is uncorrelated. Valid values are 0.0 to 1.0. The default value is 0.9.

  • columnar.max_custom_scan_paths: <integer> - maximum number of custom scan paths to generate for a columnar table when planning. Valid values are 1 to 1024. The default value is 64.

  • columnar.planner_debug_level - message level for columnar planning information in order of increasing informativeness:

    • 'log'

    • 'warning'

    • 'notice'

    • 'info'

    • 'debug'

    • 'debug1'

    • 'debug2'

    • 'debug3'

    • 'debug4'

    • 'debug5'

    The default value is debug3.

  • You can also change the tables options: columnar.compression columnar.compression_level, columnar.stripe_row_limit, columnar.chunk_group_row_limit. But they will only affect newly created tables, not newly created stripes in existing tables.

F.11.6.1. Converting between heap and columnar

Note

Make sure you understand any additional features that can be used on a table before converting it (such as row-level security, storage options, limits, inheritance, etc.) and make sure they are replicated in the new table or partition accordingly. LIKE used below is a shorthand that only works in simple cases.

CREATE TABLE my_table(i INT8 DEFAULT '7');
INSERT INTO my_table VALUES(1);
-- convert to columnar
SELECT columnar.alter_table_set_access_method('my_table', 'columnar');
-- back to row
SELECT columnar.alter_table_set_access_method('my_table', 'heap');

F.11.6.2. Default table type

The default table type for the default database is heap. You can change this using default_table_access_method:

SET default_table_access_method TO 'columnar';

For any table you create manually with CREATE TABLE the default type is columnar.

F.11.6.3. Partitioning

Columnar tables can be used as partitions; and a partitioned table can consist of any combination of heap and columnar sections.

CREATE TABLE parent(ts timestamptz, i int, n numeric, s text)
  PARTITION BY RANGE (ts);

-- columnar partition
CREATE TABLE p0 PARTITION OF parent
  FOR VALUES FROM ('2020-01-01') TO ('2020-02-01')
  USING COLUMNAR;
-- columnar partition
CREATE TABLE p1 PARTITION OF parent
  FOR VALUES FROM ('2020-02-01') TO ('2020-03-01')
  USING COLUMNAR;
-- row partition
CREATE TABLE p2 PARTITION OF parent
  FOR VALUES FROM ('2020-03-01') TO ('2020-04-01');

INSERT INTO parent VALUES ('2020-01-15', 10, 100, 'one thousand'); -- columnar
INSERT INTO parent VALUES ('2020-02-15', 20, 200, 'two thousand'); -- columnar
INSERT INTO parent VALUES ('2020-03-15', 30, 300, 'three thousand'); -- row

When performing operations on a partitioned table consisting of rows and columns, note the following behaviors of operations that are supported for rows but not for columns (for example, UPDATE, DELETE, row locks, etc.):

  • If the operation attempts to execute on a particular table with a heap storage method (for example, UPDATE p2 SET i = i + 1), it will succeed; but if it tries to execute on a columnar-organized table (eg UPDATE p1 SET i = i + 1), it will fail.

  • If the operation tries to be performed on a partitioned table and has a WHERE clause that excludes all tables stored as columnar (e.g. UPDATE parent SET i = i + 1 WHERE ts = '2020-03-15'), it will succeed.

  • If the operation tries to be performed on a partitioned table but does not exclude all columnar partitions, it will fail; even if the actual data to be updated only affects heap tables (e.g. UPDATE parent SET i = i + 1 WHERE n = 300).

Note that Columnar supports btree and hash indexes (and the restrictions that require them), but does not support gist, gin, spgist and brin indexes. For this reason, if some tables are in columnar format and if an index is not supported, it is not possible to create indexes directly on a partitioned (parent) table. In such a case, you need to create an index on the individual heap tables. Likewise for constraints that require indexes, for example:

CREATE INDEX p2_ts_idx ON p2 (ts);
CREATE UNIQUE INDEX p2_i_unique ON p2 (i);
ALTER TABLE p2 ADD UNIQUE (n);

F.11.7. Performance Microbenchmark

Important

This microbenchmark is not intended to represent any real workloads. The amount of compression, and therefore performance, will depend on the specific workload. This benchmark demonstrates a synthetic workload focused on the columnar storage method that shows its benefits.

Install plpython3u before running the test:

CREATE EXTENSION plpython3u;

Scheme:

CREATE TABLE perf_row(
    id INT8,
    ts TIMESTAMPTZ,
    customer_id INT8,
    vendor_id INT8,
    name TEXT,
    description TEXT,
    value NUMERIC,
    quantity INT4
);

CREATE TABLE perf_columnar(LIKE perf_row) USING COLUMNAR;

Functions for data generation:

CREATE OR REPLACE FUNCTION random_words(n INT4) RETURNS TEXT LANGUAGE plpython3u AS $$
import random
t = ''
words = ['zero','one','two','three','four','five','six','seven','eight','nine','ten']
for i in range(0,n):
  if (i != 0):
    t += ' '
  r = random.randint(0,len(words)-1)
  t += words[r]
return t
$$;

Insert data using random_words function:

INSERT INTO perf_row
   SELECT
    g, -- id
    '2020-01-01'::timestamptz + ('1 minute'::interval * g), -- ts
    (random() * 1000000)::INT4, -- customer_id
    (random() * 100)::INT4, -- vendor_id
    random_words(7), -- name
    random_words(100), -- description
    (random() * 100000)::INT4/100.0, -- value
    (random() * 100)::INT4 -- quantity
   FROM generate_series(1,75000000) g;

INSERT INTO perf_columnar SELECT * FROM perf_row;

Check compression level:

=> SELECT pg_total_relation_size('perf_row')::numeric/pg_total_relation_size('perf_columnar')
AS compression_ratio;

 compression_ratio
--------------------
 5.3958044063457513
(1 row)

The overall compression ratio of a columnar table compared to the same data stored in heap storage is 5.4X.

=> VACUUM VERBOSE perf_columnar;
INFO:  statistics for "perf_columnar":
storage id: 10000000000
total file size: 8761368576, total data size: 8734266196
compression rate: 5.01x
total row count: 75000000, stripe count: 500, average rows per stripe: 150000
chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000

VACUUM VERBOSE reports a lower compression ratio because it only averages the compression ratio of individual chunks and does not disregard column format metadata savings.

System characteristics:

  • Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory)

  • Linux (ubuntu 18.04)

  • Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max)

  • PostgreSQL 13 (--with-llvm, --with-python)

  • shared_buffers = 128MB

  • max_parallel_workers_per_gather = 0

  • jit = on

Note

Because this was run on a system with enough physical memory to hold a large portion of the table, the benefits of column I/O would not be fully realized by the query runtime unless the data size was significantly increased.

Workload queries:

-- OFFSET 1000 so that no rows are returned, and we collect only timings

SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
Timing (median of three runs):
  • row: 436s

  • columnar: 16s

  • speedup: 27X

F.11.8. Common recomendations

F.11.8.1. Choosing Stripe Size and Chunk Size

Determining the optimal stripe and chunk sizes for columnar tables in Tantor SE depends on various factors, such as the nature of the data, query patterns, and hardware characteristics. Here are some recommendations:

  • Stripe Size: The optimal stripe size may depend on the typical size of your queries. If queries usually return a large number of rows, a larger stripe size might be more efficient, as it reduces the number of input/output operations. On the other hand, if queries often return a small subset of data, a smaller stripe size might be preferable.

  • Chunk Size: This determines how much data will be compressed at a time. A smaller chunk size might lead to a higher compression ratio, but can increase the overhead of compression. A larger chunk size can reduce overhead, but potentially decrease the compression ratio.

  • Testing and Tuning: It is important to conduct testing with real data and queries to determine the optimal settings for your specific situation. You can start with the default recommended settings, and then experiment with different stripe and chunk sizes to see how it affects query performance and compression ratios.

  • Hardware Properties: It's also worth considering the characteristics of your hardware, such as disk and CPU throughput, as this can influence which stripe and chunk sizes are most effective.

Ultimately, the optimal stripe and chunk sizes depend on the unique characteristics of your environment, data, and query patterns.

F.11.8.2. Compression Algorithms

The choice of compression algorithm for columnar tables in Tantor SE depends on several factors, including the nature of the data, performance requirements, and hardware specifications. Here are some recommendations that may help you make a decision:

  • none: This type applies no compression to the data. It can be useful if your data is already compressed, or if you have very high performance requirements and sufficient disk space.

  • lz4: provides fast data compression and decompression. This can be useful if you have high performance requirements, but still want to save some disk space.

  • zstd: offers a higher compression ratio compared to LZ4, but requires more CPU time for compressing and decompressing data. This algorithm can be beneficial if you have limited disk space and are willing to spend a bit more CPU time on data compression.

It is important to note that choosing a compression algorithm is a trade-off between performance (compression and decompression speed) and disk space. Also, the efficiency of each compression algorithm can greatly depend on the nature of your data. Therefore, it is recommended to conduct some tests with your real data and queries to determine the most suitable compression algorithm for your situation.

F.11.9. Working with Time Series Data

F.11.9.1. Creating Tables

Time Series data is characterized by the sequential recording of values in increasing time order. For such data, orderliness is important.

Create the necessary extensions:

create extension pg_columnar;
create extension plpython3u;

Create a test table:

CREATE TABLE perf_columnar(
    id INT8,
    ts TIMESTAMPTZ,
    customer_id INT8,
    vendor_id INT8,
    name TEXT,
    description TEXT,
    value NUMERIC,
    quantity INT4
) USING columnar;

ALTER TABLE public.perf_columnar SET(
    columnar.compression = 'lz4',
    columnar.stripe_row_limit = 100000,
    columnar.chunk_group_row_limit = 10000
);

Create a function to generate random text:

CREATE OR REPLACE FUNCTION random_words(n INT4) RETURNS TEXT LANGUAGE plpython3u AS $$
import random
t = ''
words = ['zero','one','two','three','four','five','six','seven','eight','nine','ten']
for i in range(0, n):
  if (i != 0):
    t += ' '
  r = random.randint(0, len(words) - 1)
  t += words[r]
return t
$$;

Generate test data:

INSERT INTO perf_columnar
   SELECT
    g, -- id
    '2023-06-01'::timestamptz + ('1 minute'::interval * g), -- ts
    (random() * 1000000)::INT4, -- customer_id
    (random() * 100)::INT4, -- vendor_id
    random_words(7), -- name
    random_words(100), -- description
    (random() * 100000)::INT4/100.0, -- value
    (random() * 100)::INT4 -- quantity
   FROM generate_series(1,7500000) g;

As seen from the query above, we have inserted data sorted by ts. Gather statistics using VACUUM (ANALYZE, VERBOSE):

VACUUM ANALYZE perf_columnar;

Create a copy of the table with different parameters:

CREATE TABLE perf_columnar2(LIKE perf_columnar) USING COLUMNAR;

ALTER TABLE public.perf_columnar2 SET(
    columnar.compression = 'zstd',
    columnar.stripe_row_limit = 10000,
    columnar.chunk_group_row_limit = 1000
);

INSERT INTO perf_columnar2 SELECT * FROM perf_columnar;

VACUUM ANALYZE perf_columnar2;

Check the size of the resulting tables:

test_db=# \dt+
                                        List of relations
 Schema |      Name      | Type  |  Owner   | Persistence | Access method |  Size   | Description 
--------+----------------+-------+----------+-------------+---------------+---------+-------------
 public | perf_columnar  | table | postgres | permanent   | columnar      | 1886 MB | 
 public | perf_columnar2 | table | postgres | permanent   | columnar      | 850 MB  | 
(2 rows)

As can be seen, the table perf_columnar is twice as large as perf_columnar2.

Check the options of the tables:

test_db=# select * from columnar.options;

    relation    | chunk_group_row_limit | stripe_row_limit | compression | compression_level 
----------------+-----------------------+------------------+-------------+-------------------
 perf_columnar  |                 10000 |           100000 | lz4         |                 3
 perf_columnar2 |                  1000 |            10000 | zstd        |                 3

(2 rows)

F.11.9.2. Comparing Query Performance

Execute a typical query to read a set of records from the perf_columnar table for a specific time interval:

explain (analyze, verbose, buffers) select ts from perf_columnar
where ts < '2023-06-01 10:00:00'::timestamp with time zone and
      ts > '2023-06-01 10:00:05'::timestamp with time zone;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ColumnarScan) on public.perf_columnar  (cost=0.00..401.12 rows=4 width=8) (actual time=18.838..49.001 rows=4 loops=1)
   Output: ts
   Filter: ((perf_columnar.ts < '2023-06-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar.ts > '2023-06-01 10:00:05+03'::timestamp with time zone))
   Rows Removed by Filter: 9996
   Columnar Projected Columns: ts
   Columnar Chunk Group Filters: ((ts < '2023-06-01 10:00:00+03'::timestamp with time zone) AND (ts > '2023-06-01 10:00:05+03'::timestamp with time zone))
   Columnar Chunk Groups Removed by Filter: 749     <-----------
   Buffers: shared hit=3833 read=264                <-----------
 Query Identifier: 1607994334608619710
 Planning:
   Buffers: shared hit=52 read=7
 Planning Time: 12.789 ms
 Execution Time: 49.188 ms
(13 rows)

Execute a typical query to read a set of records from the perf_columnar2 table for a specific time interval:

explain (analyze, verbose, buffers) select ts from perf_columnar2
where ts < '2020-01-01 10:00:00'::timestamp with time zone and
      ts > '2020-01-01 10:00:05'::timestamp with time zone;
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ColumnarScan) on public.perf_columnar2  (cost=0.00..17.95 rows=52 width=8) (actual time=5.175..58.532 rows=49 loops=1)
   Output: ts
   Filter: ((perf_columnar2.ts < '2020-01-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar2.ts > '2020-01-01 10:00:05+03'::timestamp with time zone))
   Rows Removed by Filter: 951
   Columnar Projected Columns: ts
   Columnar Chunk Group Filters: ((ts < '2020-01-01 10:00:00+03'::timestamp with time zone) AND (ts > '2020-01-01 10:00:05+03'::timestamp with time zone))
   Columnar Chunk Groups Removed by Filter: 7499    <-----------
   Buffers: shared hit=40824 read=1                 <-----------
 Query Identifier: -801549076851693482
 Planning:
   Buffers: shared hit=155
 Planning Time: 1.086 ms
 Execution Time: 58.717 ms
(13 rows)

Note Columnar Chunk Groups Removed by Filter: 749 in the first case and 7499 in the second. However, the number of read buffers in the second query is much higher.

Terminology for understanding the behavior of these queries:

  • Stripe: This type does not apply any compression to the data. It can be useful if your data is already compressed, or if you have very high performance requirements and sufficient disk space.

  • Chunk Group: stripes are divided into chunk groups of 10,000 rows (by default).

  • Chunk: Each chunk group consists of one chunk for each column. Chunk is the unit of compression, and min/max values are tracked for each chunk, enabling chunk group filtering (Chunk Group Filtering).

  • Chunk Group Filtering: when a query's WHERE clause does not match any tuples in the chunk, and we can determine this from the min/max value for the chunk, then chunk group filtering simply skips the whole chunk group without unpacking it.

As can be seen, 749/7499 chunk groups were filtered, meaning that 7490000/7499000 rows were filtered without the need to extract or unpack the data. Only 1 chunk group (10,000 and 1,000 rows) needed to be extracted and unpacked, so the query took only milliseconds.

But as seen from the query plans, in the first case, 30 MB was used, while in the second, ten times more - 319 MB.

F.11.9.3. Using Indexes

Columnar supports btree and hash indexes (and constraints that require them), but does not support index types like gist, gin, spgist, and brin.

Consider creating indexes for the tables perf_columnar and perf_columnar2:

create index test on perf_columnar2 (ts);
create index test_h on perf_columnar (ts);

Check the size of the indexes:

test_db=# \di+
                                            List of relations

 Schema |  Name  | Type  |  Owner   |     Table      | Persistence | Access method |  Size  | Description 
--------+--------+-------+----------+----------------+-------------+---------------+--------+-------------
 public | test   | index | postgres | perf_columnar2 | permanent   | btree         | 161 MB | 
 public | test_h | index | postgres | perf_columnar  | permanent   | btree         | 161 MB | 
(2 rows)

The sizes of the indexes are the same.

Disable the forced use of Custom Scan:

SET columnar.enable_custom_scan TO false;

Re-execute the typical query to read a set of records from the perf_columnar table for a specific time interval:

explain (analyze, verbose, buffers) select ts from perf_columnar
where ts < '2023-06-01 10:00:00'::timestamp with time zone and
      ts > '2023-06-01 10:00:05'::timestamp with time zone;
                                                                             QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_h on public.perf_columnar  (cost=0.43..3217.48 rows=4 width=8) (actual time=402.144..402.204 rows=4 loops=1)
   Output: ts
   Index Cond: ((perf_columnar.ts < '2023-06-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar.ts > '2023-06-01 10:00:05+03'::timestamp with time zone))
   Buffers: shared hit=181 read=3232      <-----------
 Query Identifier: 1607994334608619710
 Planning:
   Buffers: shared hit=20 read=5
 Planning Time: 16.278 ms
 Execution Time: 402.386 ms
(9 rows)

Nearly the same number of buffers were scanned as when using Custom Scan.

Re-execute the typical query to read a set of records from the perf_columnar2 table for a specific time interval:

explain (analyze, verbose, buffers) select ts from perf_columnar2
where ts < '2020-01-01 10:00:00'::timestamp with time zone and
      ts > '2020-01-01 10:00:05'::timestamp with time zone;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using test on public.perf_columnar2  (cost=0.43..153.05 rows=52 width=8) (actual time=14.620..14.821 rows=49 loops=1)
   Output: ts
   Index Cond: ((perf_columnar2.ts < '2020-01-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar2.ts > '2020-01-01 10:00:05+03'::timestamp with time zone))
   Buffers: shared hit=372 read=145     <-----------
 Query Identifier: -801549076851693482
 Planning:
   Buffers: shared hit=97
 Planning Time: 0.813 ms
 Execution Time: 14.978 ms
(13 rows)

For the second table, far fewer buffers were scanned, and the query is executed in the most optimal manner.

F.11.9.4. Conclusions

The following conclusions can be drawn:

  • The most effective data compression method is zstd;

  • The size of the stripe and chunk affects the amount of buffers scanned using the Chunk Group Filtering method;

  • When reading a small amount of data, using an index may prove to be more efficient;

  • To test the hypothesis, it is necessary to forcibly enable the use of indexes with the command SET columnar.enable_custom_scan TO false;

  • Sequential recording of TS data can significantly reduce the size of indexes and the volume of unpacked chunks (Chunk). Therefore, it is recommended to sort data before inserting it into the database or use clustering (cluster).