F.11. Citus Optimized Row Columnar (ORC)#
F.11. Citus Optimized Row Columnar (ORC)
F.11.1. About Citus Columnar
Version: 12.1.5
Project page www.citusdata.com
Copyright © Citus Data, Inc.
F.11.2. Into
In traditional analytical (OLAP) systems, data size and volume tend to increase over time, potentially leading to various performance challenges. Columnar storage offers considerable advantages, especially in scenarios where database management systems (DBMSs) are employed for analytical tasks that predominantly involve batch data loading.
Columnar storage allows for substantial table compression, greatly reducing resource usage while maintaining high performance. This is achievable without resorting to distributed MPP (Massive Parallel Processing) DBMS systems. The inherent structure of ORC in Columnar storage facilitates efficient disk reads and offers superior data compression, enhancing overall performance.
Note
While the Columnar storage approach doesn't entirely supplant distributed MPP (Massive Parallel Processing) DBMSs in analytics, it can eliminate the need for them in certain situations. The decision to transition to an MPP DBMS depends on specific data volumes and workload demands. Although this threshold varies, it is generally in the realm of tens of terabytes.
F.11.3. Columnar Installation
The Columnar storage method is set as an extension. To add Columnar to your local Tantor SE database, connect with psql and run:
CREATE EXTENSION pg_columnar;
Now you can create and use Columnar tables in your database.
F.11.4. Columnar basic principles
Tantor SE standard heap
data storage method
works well for an OLTP workload:
Support
UPDATE
/DELETE
operations,Efficient single-tuple lookups
Columnar tables are best suited for analytic or data warehouse workloads, where the following benefits will be useful compared to the heap method:
Compression
Doesn't read unnecessary columns
Efficient VACUUM.
F.11.5. Usage of the Table Access Method Functionality
Advantages of Columnar over cstore_fdw:
Columnar storage is built upon the Table Access Method API. This API enables similar operations on columnar tables as on traditional row-oriented (heap) tables for most functionalities.
Supports Write Ahead Log (WAL);
Supports
ROLLBACK
transactions;Supports physical replication;
Supports recovery, including point-in-time recovery (PITR);
Supports
pg_dump
andpg_upgrade
without the need for special actions or additional steps;Supports most of the functionality that works with regular heap tables (rows).
Restrictions
Append-only (
UPDATE
/DELETE
operations are not supported);Does not free up space (for example, rolled-back transactions may still take up disk space);
Bitmap index scans are not supported;
Not supported by tidscans;
Sample scans are not supported;
TOAST storage method is not supported (large values are stored internally);
ON CONFLICT is not supported (other than DO NOTHING without specifying a target);
Row-level locks are not supported (
SELECT ... FOR SHARE
,SELECT ... FOR UPDATE
);The serializable isolation level is not supported;
Versions of Tantor SE 14+ are supported;
Foreign keys, unique constraints, or exclusion constraints are not supported.
Logic decoding is not supported;
Parallel scans inside a node are not supported;
AFTER ... FOR EACH ROW
triggers are not supported;UNLOGGED columnar tables are not supported.
Over time, the restrictions listed above will be gradually lifted.
F.11.6. Columnar Usage
Creating a columnar table, specifying the
USING columnar
option:
CREATE TABLE my_columnar_table ( id INT, i1 INT, i2 INT8, n NUMERIC, t TEXT ) USING columnar;
Insert the data into the table and read it as usual (subject to the restrictions listed above).
To view internal statistics for a table, use
VACUUM VERBOSE
. Note that
VACUUM
(without FULL
) is
much faster on a columnar table because it only scans the
metadata and not the actual data.
Set table options using ALTER TABLE
:
ALTER TABLE my_columnar_table SET( columnar.compression = 'none', columnar.stripe_row_limit = 10000 );
For tables following options are available:
columnar.compression:
['none'|'pglz'|'zstd'|'lz4'|'lz4hc']
- set compression type for newly inserted data. Existing data will not be recompressed/decompressed. The default value is'zstd'
(if support was specified when compiling).columnar.compression_level:
<integer>
- sets the compression level. Valid values are1
to19
. If the compression method does not support the selected level, the closest level will be selected instead. The default value is3
.columnar.stripe_row_limit:
<integer>
- the maximum number of rows in a stripe for newly inserted data. Existing data stripes will not be modified and may contain more rows than this maximum value. Valid values are1000
to100000000
. The default value is150000
.columnar.chunk_group_row_limit:
<integer>
- maximum number of rows in a chunk for newly inserted data. Existing data chunks will not be modified and may have more rows than this maximum value. Valid values are1000
to100000000
. The default value is10000
.
You can view the options of all tables using:
SELECT * from columnar.options;
Or a specific table:
SELECT * FROM columnar.options WHERE relation = 'my_columnar_table'::regclass;
You can reset one or more table options to their default (or current values using SET
)
using columnar.alter_columnar_table_reset
:
SELECT columnar.alter_columnar_table_reset( 'my_columnar_table', chunk_group_row_limit => true );
Set general columnar
options using SET
:
SET columnar.compression TO 'none'; SET columnar.enable_qual_pushdown TO false; SET columnar.compression TO default;
columnar.enable_custom_scan:
<boolean>
- enables the use of a custom scan to push projections and quals into the storage layer. The default value istrue
.columnar.enable_qual_pushdown:
<boolean>
- enables qual pushdown into columnar. This has no effect unlesscolumnar.enable_custom_scan
istrue
. The default value istrue
.columnar.qual_pushdown_correlation_threshold:
<real>
- correlation threshold to attempt to push a qual referencing the given column. A value of0
means attempt to push down all quals, even if the column is uncorrelated. Valid values are0.0
to1.0
. The default value is0.9
.columnar.max_custom_scan_paths:
<integer>
- maximum number of custom scan paths to generate for a columnar table when planning. Valid values are1
to1024
. The default value is64
.columnar.planner_debug_level - message level for columnar planning information in order of increasing informativeness:
'log'
'warning'
'notice'
'info'
'debug'
'debug1'
'debug2'
'debug3'
'debug4'
'debug5'
The default value is
debug3
.You can also change the tables options:
columnar.compression
columnar.compression_level
,columnar.stripe_row_limit
,columnar.chunk_group_row_limit
. But they will only affect newly created tables, not newly created stripes in existing tables.
F.11.6.1. Converting between heap and columnar
Note
Make sure you understand any additional features that can be
used on a table before converting it (such as row-level
security, storage options, limits, inheritance, etc.) and
make sure they are replicated in the new table or partition
accordingly. LIKE
used below is a
shorthand that only works in simple cases.
CREATE TABLE my_table(i INT8 DEFAULT '7'); INSERT INTO my_table VALUES(1); -- convert to columnar SELECT columnar.alter_table_set_access_method('my_table', 'columnar'); -- back to row SELECT columnar.alter_table_set_access_method('my_table', 'heap');
F.11.6.2. Default table type
The default table type for the default database is
heap
. You can change this using
default_table_access_method
:
SET default_table_access_method TO 'columnar';
For any table you create manually with
CREATE TABLE
the default type is
columnar
.
F.11.6.3. Partitioning
Columnar tables can be used as partitions; and a partitioned table can consist of any combination of heap and columnar sections.
CREATE TABLE parent(ts timestamptz, i int, n numeric, s text) PARTITION BY RANGE (ts); -- columnar partition CREATE TABLE p0 PARTITION OF parent FOR VALUES FROM ('2020-01-01') TO ('2020-02-01') USING COLUMNAR; -- columnar partition CREATE TABLE p1 PARTITION OF parent FOR VALUES FROM ('2020-02-01') TO ('2020-03-01') USING COLUMNAR; -- row partition CREATE TABLE p2 PARTITION OF parent FOR VALUES FROM ('2020-03-01') TO ('2020-04-01'); INSERT INTO parent VALUES ('2020-01-15', 10, 100, 'one thousand'); -- columnar INSERT INTO parent VALUES ('2020-02-15', 20, 200, 'two thousand'); -- columnar INSERT INTO parent VALUES ('2020-03-15', 30, 300, 'three thousand'); -- row
When performing operations on a partitioned table consisting
of rows and columns, note the following behaviors of
operations that are supported for rows but not for columns
(for example, UPDATE
,
DELETE
, row locks, etc.):
If the operation attempts to execute on a particular table with a heap storage method (for example,
UPDATE p2 SET i = i + 1
), it will succeed; but if it tries to execute on a columnar-organized table (egUPDATE p1 SET i = i + 1
), it will fail.If the operation tries to be performed on a partitioned table and has a WHERE clause that excludes all tables stored as columnar (e.g.
UPDATE parent SET i = i + 1 WHERE ts = '2020-03-15'
), it will succeed.If the operation tries to be performed on a partitioned table but does not exclude all columnar partitions, it will fail; even if the actual data to be updated only affects heap tables (e.g.
UPDATE parent SET i = i + 1 WHERE n = 300
).
Note that Columnar supports btree
and
hash
indexes (and the restrictions that
require them), but does not support gist
,
gin
, spgist
and
brin
indexes. For this reason, if some
tables are in columnar format and if an index is not
supported, it is not possible to create indexes directly on a
partitioned (parent) table. In such a case, you need to create
an index on the individual heap tables. Likewise for
constraints that require indexes, for example:
CREATE INDEX p2_ts_idx ON p2 (ts); CREATE UNIQUE INDEX p2_i_unique ON p2 (i); ALTER TABLE p2 ADD UNIQUE (n);
F.11.7. Performance Microbenchmark
Important
This microbenchmark is not intended to represent any real workloads. The amount of compression, and therefore performance, will depend on the specific workload. This benchmark demonstrates a synthetic workload focused on the columnar storage method that shows its benefits.
Install plpython3u before running the test:
CREATE EXTENSION plpython3u;
Scheme:
CREATE TABLE perf_row( id INT8, ts TIMESTAMPTZ, customer_id INT8, vendor_id INT8, name TEXT, description TEXT, value NUMERIC, quantity INT4 ); CREATE TABLE perf_columnar(LIKE perf_row) USING COLUMNAR;
Functions for data generation:
CREATE OR REPLACE FUNCTION random_words(n INT4) RETURNS TEXT LANGUAGE plpython3u AS $$ import random t = '' words = ['zero','one','two','three','four','five','six','seven','eight','nine','ten'] for i in range(0,n): if (i != 0): t += ' ' r = random.randint(0,len(words)-1) t += words[r] return t $$;
Insert data using random_words
function:
INSERT INTO perf_row SELECT g, -- id '2020-01-01'::timestamptz + ('1 minute'::interval * g), -- ts (random() * 1000000)::INT4, -- customer_id (random() * 100)::INT4, -- vendor_id random_words(7), -- name random_words(100), -- description (random() * 100000)::INT4/100.0, -- value (random() * 100)::INT4 -- quantity FROM generate_series(1,75000000) g; INSERT INTO perf_columnar SELECT * FROM perf_row;
Check compression level:
=> SELECT pg_total_relation_size('perf_row')::numeric/pg_total_relation_size('perf_columnar') AS compression_ratio; compression_ratio -------------------- 5.3958044063457513 (1 row)
The overall compression ratio of a columnar table compared to the same data stored in heap storage is 5.4X.
=> VACUUM VERBOSE perf_columnar; INFO: statistics for "perf_columnar": storage id: 10000000000 total file size: 8761368576, total data size: 8734266196 compression rate: 5.01x total row count: 75000000, stripe count: 500, average rows per stripe: 150000 chunk count: 60000, containing data for dropped columns: 0, zstd compressed: 60000
VACUUM VERBOSE
reports a lower compression
ratio because it only averages the compression ratio of
individual chunks and does not disregard column format
metadata savings.
System characteristics:
Azure VM: Standard D2s v3 (2 vcpus, 8 GiB memory)
Linux (ubuntu 18.04)
Data Drive: Standard HDD (512GB, 500 IOPS Max, 60 MB/s Max)
PostgreSQL 13 (
--with-llvm
,--with-python
)shared_buffers = 128MB
max_parallel_workers_per_gather = 0
jit = on
Note
Because this was run on a system with enough physical memory to hold a large portion of the table, the benefits of column I/O would not be fully realized by the query runtime unless the data size was significantly increased.
Workload queries:
-- OFFSET 1000 so that no rows are returned, and we collect only timings SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000; SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000; SELECT vendor_id, SUM(quantity) FROM perf_row GROUP BY vendor_id OFFSET 1000; SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000; SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000; SELECT vendor_id, SUM(quantity) FROM perf_columnar GROUP BY vendor_id OFFSET 1000;
- Timing (median of three runs):
row: 436s
columnar: 16s
speedup: 27X
F.11.8. Common recomendations
F.11.8.1. Choosing Stripe Size and Chunk Size
Determining the optimal stripe and chunk sizes for columnar
tables
in Tantor SE depends on
various factors, such as the nature of the data, query patterns, and hardware characteristics.
Here are some recommendations:
Stripe Size: The optimal stripe size may depend on the typical size of your queries. If queries usually return a large number of rows, a larger stripe size might be more efficient, as it reduces the number of input/output operations. On the other hand, if queries often return a small subset of data, a smaller stripe size might be preferable.
Chunk Size: This determines how much data will be compressed at a time. A smaller chunk size might lead to a higher compression ratio, but can increase the overhead of compression. A larger chunk size can reduce overhead, but potentially decrease the compression ratio.
Testing and Tuning: It is important to conduct testing with real data and queries to determine the optimal settings for your specific situation. You can start with the default recommended settings, and then experiment with different stripe and chunk sizes to see how it affects query performance and compression ratios.
Hardware Properties: It's also worth considering the characteristics of your hardware, such as disk and CPU throughput, as this can influence which stripe and chunk sizes are most effective.
Ultimately, the optimal stripe and chunk sizes depend on the unique characteristics of your environment, data, and query patterns.
F.11.8.2. Compression Algorithms
The choice of compression algorithm for columnar
tables in Tantor SE
depends on several factors, including the nature of the data, performance requirements,
and hardware specifications. Here are some recommendations that may help you make a decision:
none
: This type applies no compression to the data. It can be useful if your data is already compressed, or if you have very high performance requirements and sufficient disk space.lz4
: provides fast data compression and decompression. This can be useful if you have high performance requirements, but still want to save some disk space.zstd
: offers a higher compression ratio compared toLZ4
, but requires more CPU time for compressing and decompressing data. This algorithm can be beneficial if you have limited disk space and are willing to spend a bit more CPU time on data compression.
It is important to note that choosing a compression algorithm is a trade-off between performance (compression and decompression speed) and disk space. Also, the efficiency of each compression algorithm can greatly depend on the nature of your data. Therefore, it is recommended to conduct some tests with your real data and queries to determine the most suitable compression algorithm for your situation.
F.11.9. Working with Time Series Data
F.11.9.1. Creating Tables
Time Series data is characterized by the sequential recording of values in increasing time order. For such data, orderliness is important.
Create the necessary extensions:
create extension pg_columnar; create extension plpython3u;
Create a test table:
CREATE TABLE perf_columnar( id INT8, ts TIMESTAMPTZ, customer_id INT8, vendor_id INT8, name TEXT, description TEXT, value NUMERIC, quantity INT4 ) USING columnar; ALTER TABLE public.perf_columnar SET( columnar.compression = 'lz4', columnar.stripe_row_limit = 100000, columnar.chunk_group_row_limit = 10000 );
Create a function to generate random text:
CREATE OR REPLACE FUNCTION random_words(n INT4) RETURNS TEXT LANGUAGE plpython3u AS $$ import random t = '' words = ['zero','one','two','three','four','five','six','seven','eight','nine','ten'] for i in range(0, n): if (i != 0): t += ' ' r = random.randint(0, len(words) - 1) t += words[r] return t $$;
Generate test data:
INSERT INTO perf_columnar SELECT g, -- id '2023-06-01'::timestamptz + ('1 minute'::interval * g), -- ts (random() * 1000000)::INT4, -- customer_id (random() * 100)::INT4, -- vendor_id random_words(7), -- name random_words(100), -- description (random() * 100000)::INT4/100.0, -- value (random() * 100)::INT4 -- quantity FROM generate_series(1,7500000) g;
As seen from the query above, we have inserted data sorted by ts.
Gather statistics using VACUUM (ANALYZE, VERBOSE)
:
VACUUM ANALYZE perf_columnar;
Create a copy of the table with different parameters:
CREATE TABLE perf_columnar2(LIKE perf_columnar) USING COLUMNAR; ALTER TABLE public.perf_columnar2 SET( columnar.compression = 'zstd', columnar.stripe_row_limit = 10000, columnar.chunk_group_row_limit = 1000 ); INSERT INTO perf_columnar2 SELECT * FROM perf_columnar; VACUUM ANALYZE perf_columnar2;
Check the size of the resulting tables:
test_db=# \dt+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------------+-------+----------+-------------+---------------+---------+------------- public | perf_columnar | table | postgres | permanent | columnar | 1886 MB | public | perf_columnar2 | table | postgres | permanent | columnar | 850 MB | (2 rows)
As can be seen, the table perf_columnar
is twice as large as perf_columnar2
.
Check the options of the tables:
test_db=# select * from columnar.options; relation | chunk_group_row_limit | stripe_row_limit | compression | compression_level ----------------+-----------------------+------------------+-------------+------------------- perf_columnar | 10000 | 100000 | lz4 | 3 perf_columnar2 | 1000 | 10000 | zstd | 3 (2 rows)
F.11.9.2. Comparing Query Performance
Execute a typical query to read a set of records from the perf_columnar
table
for a specific time interval:
explain (analyze, verbose, buffers) select ts from perf_columnar where ts < '2023-06-01 10:00:00'::timestamp with time zone and ts > '2023-06-01 10:00:05'::timestamp with time zone; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Custom Scan (ColumnarScan) on public.perf_columnar (cost=0.00..401.12 rows=4 width=8) (actual time=18.838..49.001 rows=4 loops=1) Output: ts Filter: ((perf_columnar.ts < '2023-06-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar.ts > '2023-06-01 10:00:05+03'::timestamp with time zone)) Rows Removed by Filter: 9996 Columnar Projected Columns: ts Columnar Chunk Group Filters: ((ts < '2023-06-01 10:00:00+03'::timestamp with time zone) AND (ts > '2023-06-01 10:00:05+03'::timestamp with time zone)) Columnar Chunk Groups Removed by Filter: 749 <----------- Buffers: shared hit=3833 read=264 <----------- Query Identifier: 1607994334608619710 Planning: Buffers: shared hit=52 read=7 Planning Time: 12.789 ms Execution Time: 49.188 ms (13 rows)
Execute a typical query to read a set of records from the perf_columnar2
table
for a specific time interval:
explain (analyze, verbose, buffers) select ts from perf_columnar2 where ts < '2020-01-01 10:00:00'::timestamp with time zone and ts > '2020-01-01 10:00:05'::timestamp with time zone; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Custom Scan (ColumnarScan) on public.perf_columnar2 (cost=0.00..17.95 rows=52 width=8) (actual time=5.175..58.532 rows=49 loops=1) Output: ts Filter: ((perf_columnar2.ts < '2020-01-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar2.ts > '2020-01-01 10:00:05+03'::timestamp with time zone)) Rows Removed by Filter: 951 Columnar Projected Columns: ts Columnar Chunk Group Filters: ((ts < '2020-01-01 10:00:00+03'::timestamp with time zone) AND (ts > '2020-01-01 10:00:05+03'::timestamp with time zone)) Columnar Chunk Groups Removed by Filter: 7499 <----------- Buffers: shared hit=40824 read=1 <----------- Query Identifier: -801549076851693482 Planning: Buffers: shared hit=155 Planning Time: 1.086 ms Execution Time: 58.717 ms (13 rows)
Note Columnar Chunk Groups Removed by Filter: 749
in the first case and 7499
in the second.
However, the number of read buffers in the second query is much higher.
Terminology for understanding the behavior of these queries:
Stripe
: This type does not apply any compression to the data. It can be useful if your data is already compressed, or if you have very high performance requirements and sufficient disk space.Chunk Group
: stripes are divided into chunk groups of 10,000 rows (by default).Chunk
: Each chunk group consists of one chunk for each column.Chunk
is the unit of compression, and min/max values are tracked for each chunk, enabling chunk group filtering (Chunk Group Filtering
).Chunk Group Filtering
: when a query'sWHERE
clause does not match any tuples in the chunk, and we can determine this from the min/max value for the chunk, then chunk group filtering simply skips the whole chunk group without unpacking it.
As can be seen, 749/7499
chunk groups were filtered, meaning
that 7490000/7499000
rows were filtered without the need to extract or unpack
the data. Only 1 chunk group (10,000 and 1,000 rows) needed to be extracted and
unpacked, so the query took only milliseconds.
But as seen from the query plans, in the first case, 30 MB was used, while in the second, ten times more - 319 MB.
F.11.9.3. Using Indexes
Columnar supports btree
and hash
indexes (and constraints that require them),
but does not support index types like gist
, gin
,
spgist
, and brin
.
Consider creating indexes for the tables perf_columnar
and
perf_columnar2
:
create index test on perf_columnar2 (ts); create index test_h on perf_columnar (ts);
Check the size of the indexes:
test_db=# \di+ List of relations Schema | Name | Type | Owner | Table | Persistence | Access method | Size | Description --------+--------+-------+----------+----------------+-------------+---------------+--------+------------- public | test | index | postgres | perf_columnar2 | permanent | btree | 161 MB | public | test_h | index | postgres | perf_columnar | permanent | btree | 161 MB | (2 rows)
The sizes of the indexes are the same.
Disable the forced use of Custom Scan
:
SET columnar.enable_custom_scan TO false;
Re-execute the typical query to read a set of records from the perf_columnar
table
for a specific time interval:
explain (analyze, verbose, buffers) select ts from perf_columnar where ts < '2023-06-01 10:00:00'::timestamp with time zone and ts > '2023-06-01 10:00:05'::timestamp with time zone; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using test_h on public.perf_columnar (cost=0.43..3217.48 rows=4 width=8) (actual time=402.144..402.204 rows=4 loops=1) Output: ts Index Cond: ((perf_columnar.ts < '2023-06-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar.ts > '2023-06-01 10:00:05+03'::timestamp with time zone)) Buffers: shared hit=181 read=3232 <----------- Query Identifier: 1607994334608619710 Planning: Buffers: shared hit=20 read=5 Planning Time: 16.278 ms Execution Time: 402.386 ms (9 rows)
Nearly the same number of buffers were scanned as when using
Custom Scan
.
Re-execute the typical query to read a set of records from the perf_columnar2
table
for a specific time interval:
explain (analyze, verbose, buffers) select ts from perf_columnar2 where ts < '2020-01-01 10:00:00'::timestamp with time zone and ts > '2020-01-01 10:00:05'::timestamp with time zone; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using test on public.perf_columnar2 (cost=0.43..153.05 rows=52 width=8) (actual time=14.620..14.821 rows=49 loops=1) Output: ts Index Cond: ((perf_columnar2.ts < '2020-01-01 10:00:00+03'::timestamp with time zone) AND (perf_columnar2.ts > '2020-01-01 10:00:05+03'::timestamp with time zone)) Buffers: shared hit=372 read=145 <----------- Query Identifier: -801549076851693482 Planning: Buffers: shared hit=97 Planning Time: 0.813 ms Execution Time: 14.978 ms (13 rows)
For the second table, far fewer buffers were scanned, and the query is executed in the most optimal manner.
F.11.9.4. Conclusions
The following conclusions can be drawn:
The most effective data compression method is
zstd
;The size of the stripe and chunk affects the amount of buffers scanned using the
Chunk Group Filtering
method;When reading a small amount of data, using an index may prove to be more efficient;
To test the hypothesis, it is necessary to forcibly enable the use of indexes with the command
SET columnar.enable_custom_scan TO false
;Sequential recording of TS data can significantly reduce the size of indexes and the volume of unpacked chunks (
Chunk
). Therefore, it is recommended to sort data before inserting it into the database or use clustering (cluster
).