Skip to main content
Ctrl+K

Tantor SE 1C 16.8 Documentation

EN RU

Table of contents:

  • Preface
    • 1.  What Is Tantor SE-1C?
    • 2. A Brief History of PostgreSQL
    • 3. What are the differences between Tantor SE 1C 16 and PostgreSQL 16
    • 4. 1C improvements
    • 5. Conventions
    • 6. Bug Reporting Guidelines
  • Part I. Tutorial
    • Chapter 1. Synopsis
    • Chapter 2. The SQL Language
    • Chapter 3. Advanced Features
  • Part II. The SQL Language
    • Chapter 4. SQL Syntax
    • Chapter 5. Data Definition
    • Chapter 6. Data Manipulation
    • Chapter 7. Queries
    • Chapter 8. Data Types
    • Chapter 9. Functions and Operators
    • Chapter 10. Type Conversion
    • Chapter 11. Indexes
    • Chapter 12. Full Text Search
    • Chapter 13. Concurrency Control
    • Chapter 14. Performance Tips
    • Chapter 15. Parallel Query
  • Part III. Server Administration
    • Chapter 16. Installation from Binaries
    • Chapter 17. Server Setup and Operation
    • Chapter 18. Server Configuration
    • Chapter 19. Client Authentication
    • Chapter 20. Database Roles
    • Chapter 21. Managing Databases
    • Chapter 22. Localization
    • Chapter 23. Routine Database Maintenance Tasks
    • Chapter 24. Backup and Restore
    • Chapter 25. High Availability, Load Balancing, and Replication
    • Chapter 26. Monitoring Database Activity
    • Chapter 27. Monitoring Disk Usage
    • Chapter 28. Reliability and the Write-Ahead Log
    • Chapter 29. Logical Replication
    • Chapter 30. Just-in-Time Compilation (JIT)
  • Part IV. Client Interfaces
    • Chapter 31. libpq — C Library
    • Chapter 32. Large Objects
    • Chapter 33. ECPG — Embedded SQL in C
    • Chapter 34. The Information Schema
  • Part V. Server Programming
    • Chapter 35. Extending SQL
    • Chapter 36. Triggers
    • Chapter 37. Event Triggers
    • Chapter 38. The Rule System
    • Chapter 39. Procedural Languages
    • Chapter 40. PL/pgSQL — SQL Procedural Language
    • Chapter 41. PL/Tcl — Tcl Procedural Language
    • Chapter 42. PL/Perl — Perl Procedural Language
    • Chapter 43. PL/Python — Python Procedural Language
    • Chapter 44. Server Programming Interface
    • Chapter 45. Background Worker Processes
    • Chapter 46. Logical Decoding
    • Chapter 47. Replication Progress Tracking
    • Chapter 48. Archive Modules
  • Part VI. Reference
    • SQL Commands
    • PostgreSQL Client Applications
    • PostgreSQL Server Applications
  • Part VII. Internals
    • Chapter 49. Overview of PostgreSQL Internals
    • Chapter 50. System Catalogs
    • Chapter 51. System Views
    • Chapter 52. Frontend/Backend Protocol
    • Chapter 53. PostgreSQL Coding Conventions
    • Chapter 54. Native Language Support
    • Chapter 55. Writing a Procedural Language Handler
    • Chapter 56. Writing a Foreign Data Wrapper
    • Chapter 57. Writing a Table Sampling Method
    • Chapter 58. Writing a Custom Scan Provider
    • Chapter 59. Genetic Query Optimizer
    • Chapter 60. Table Access Method Interface Definition
    • Chapter 61. Index Access Method Interface Definition
    • Chapter 62. Generic WAL Records
    • Chapter 63. Custom WAL Resource Managers
    • Chapter 64. B-Tree Indexes
    • Chapter 65. GiST Indexes
    • Chapter 66. SP-GiST Indexes
    • Chapter 67. GIN Indexes
    • Chapter 68. BRIN Indexes
    • Chapter 69. Hash Indexes
    • Chapter 70. Database Physical Storage
    • Chapter 71. Transaction Processing
    • Chapter 72. System Catalog Declarations and Initial Contents
    • Chapter 73. How the Planner Uses Statistics
    • Chapter 74. Backup Manifest Format
  • Part VIII. Appendixes
    • Appendix A. Tantor SE-1C Error Codes
    • Appendix B. Date/Time Support
    • Appendix C. SQL Key Words
    • Appendix D. SQL Conformance
    • Appendix E. Release Notes
    • Appendix F. Additional Supplied Modules and Extensions
    • Appendix G. Additional Supplied Programs
    • Appendix H. Additional External Modules
    • Appendix I. External Projects
    • Appendix J. Tantor SE-1C Limits
    • Appendix K. Acronyms
    • Appendix L. Glossary
    • Appendix M. Color Support
    • Appendix N. Obsolete or Renamed Features
  • Bibliography
  • Index

12.8. Testing and Debugging Text Search

12.8. Testing and Debugging Text Search#

12.8. Testing and Debugging Text Search
12.8. Testing and Debugging Text Search
Prev UpChapter 12. Full Text SearchHome Next

12.8. Testing and Debugging Text Search #

12.8.1. Configuration Testing
12.8.2. Parser Testing
12.8.3. Dictionary Testing

The behavior of a custom text search configuration can easily become confusing. The functions described in this section are useful for testing text search objects. You can test a complete configuration, or test parsers and dictionaries separately.

12.8.1. Configuration Testing #

The function ts_debug allows easy testing of a text search configuration.

ts_debug([ config regconfig, ] document text,
         OUT alias text,
         OUT description text,
         OUT token text,
         OUT dictionaries regdictionary[],
         OUT dictionary regdictionary,
         OUT lexemes text[])
         returns setof record

ts_debug displays information about every token of document as produced by the parser and processed by the configured dictionaries. It uses the configuration specified by config, or default_text_search_config if that argument is omitted.

ts_debug returns one row for each token identified in the text by the parser. The columns returned are

  • alias text — short name of the token type

  • description text — description of the token type

  • token text — text of the token

  • dictionaries regdictionary[] — the dictionaries selected by the configuration for this token type

  • dictionary regdictionary — the dictionary that recognized the token, or NULL if none did

  • lexemes text[] — the lexeme(s) produced by the dictionary that recognized the token, or NULL if none did; an empty array ({}) means it was recognized as a stop word

Here is a simple example:

SELECT * FROM ts_debug('english', 'a fat  cat sat on a mat - it ate a fat rats');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+-------+----------------+--------------+---------
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | fat   | {english_stem} | english_stem | {fat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | cat   | {english_stem} | english_stem | {cat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | sat   | {english_stem} | english_stem | {sat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | on    | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | mat   | {english_stem} | english_stem | {mat}
 blank     | Space symbols   |       | {}             |              |
 blank     | Space symbols   | -     | {}             |              |
 asciiword | Word, all ASCII | it    | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | ate   | {english_stem} | english_stem | {ate}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | fat   | {english_stem} | english_stem | {fat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | rats  | {english_stem} | english_stem | {rat}

For a more extensive demonstration, we first create a public.english configuration and Ispell dictionary for the English language:

CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = english,
    AffFile = english,
    StopWords = english
);

ALTER TEXT SEARCH CONFIGURATION public.english
   ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
SELECT * FROM ts_debug('public.english', 'The Brightest supernovaes');
   alias   |   description   |    token    |         dictionaries          |   dictionary   |   lexemes
-----------+-----------------+-------------+-------------------------------+----------------+-------------
 asciiword | Word, all ASCII | The         | {english_ispell,english_stem} | english_ispell | {}
 blank     | Space symbols   |             | {}                            |                |
 asciiword | Word, all ASCII | Brightest   | {english_ispell,english_stem} | english_ispell | {bright}
 blank     | Space symbols   |             | {}                            |                |
 asciiword | Word, all ASCII | supernovaes | {english_ispell,english_stem} | english_stem   | {supernova}

In this example, the word Brightest was recognized by the parser as an ASCII word (alias asciiword). For this token type the dictionary list is english_ispell and english_stem. The word was recognized by english_ispell, which reduced it to the noun bright. The word supernovaes is unknown to the english_ispell dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact, english_stem is a Snowball dictionary which recognizes everything; that is why it was placed at the end of the dictionary list).

The word The was recognized by the english_ispell dictionary as a stop word (Section 12.6.1) and will not be indexed. The spaces are discarded too, since the configuration provides no dictionaries at all for them.

You can reduce the width of the output by explicitly specifying which columns you want to see:

SELECT alias, token, dictionary, lexemes
FROM ts_debug('public.english', 'The Brightest supernovaes');
   alias   |    token    |   dictionary   |   lexemes
-----------+-------------+----------------+-------------
 asciiword | The         | english_ispell | {}
 blank     |             |                |
 asciiword | Brightest   | english_ispell | {bright}
 blank     |             |                |
 asciiword | supernovaes | english_stem   | {supernova}

12.8.2. Parser Testing #

The following functions allow direct testing of a text search parser.

ts_parse(parser_name text, document text,
         OUT tokid integer, OUT token text) returns setof record
ts_parse(parser_oid oid, document text,
         OUT tokid integer, OUT token text) returns setof record

ts_parse parses the given document and returns a series of records, one for each token produced by parsing. Each record includes a tokid showing the assigned token type and a token which is the text of the token. For example:

SELECT * FROM ts_parse('default', '123 - a number');
 tokid | token
-------+--------
    22 | 123
    12 |
    12 | -
     1 | a
    12 |
     1 | number

ts_token_type(parser_name text, OUT tokid integer,
              OUT alias text, OUT description text) returns setof record
ts_token_type(parser_oid oid, OUT tokid integer,
              OUT alias text, OUT description text) returns setof record

ts_token_type returns a table which describes each type of token the specified parser can recognize. For each token type, the table gives the integer tokid that the parser uses to label a token of that type, the alias that names the token type in configuration commands, and a short description. For example:

SELECT * FROM ts_token_type('default');
 tokid |      alias      |               description
-------+-----------------+------------------------------------------
     1 | asciiword       | Word, all ASCII
     2 | word            | Word, all letters
     3 | numword         | Word, letters and digits
     4 | email           | Email address
     5 | url             | URL
     6 | host            | Host
     7 | sfloat          | Scientific notation
     8 | version         | Version number
     9 | hword_numpart   | Hyphenated word part, letters and digits
    10 | hword_part      | Hyphenated word part, all letters
    11 | hword_asciipart | Hyphenated word part, all ASCII
    12 | blank           | Space symbols
    13 | tag             | XML tag
    14 | protocol        | Protocol head
    15 | numhword        | Hyphenated word, letters and digits
    16 | asciihword      | Hyphenated word, all ASCII
    17 | hword           | Hyphenated word, all letters
    18 | url_path        | URL path
    19 | file            | File or path name
    20 | float           | Decimal notation
    21 | int             | Signed integer
    22 | uint            | Unsigned integer
    23 | entity          | XML entity

12.8.3. Dictionary Testing #

The ts_lexize function facilitates dictionary testing.

ts_lexize(dict regdictionary, token text) returns text[]

ts_lexize returns an array of lexemes if the input token is known to the dictionary, or an empty array if the token is known to the dictionary but it is a stop word, or NULL if it is an unknown word.

Examples:

SELECT ts_lexize('english_stem', 'stars');
 ts_lexize
-----------
 {star}

SELECT ts_lexize('english_stem', 'a');
 ts_lexize
-----------
 {}

Note

The ts_lexize function expects a single token, not text. Here is a case where this can be confusing:

SELECT ts_lexize('thesaurus_astro', 'supernovae stars') is null;
 ?column?
----------
 t

The thesaurus dictionary thesaurus_astro does know the phrase supernovae stars, but ts_lexize fails since it does not parse the input text but treats it as a single token. Use plainto_tsquery or to_tsvector to test thesaurus dictionaries, for example:

SELECT plainto_tsquery('supernovae stars');
 plainto_tsquery
-----------------
 'sn'


Prev Up Next
12.7. Configuration Example Home 12.9. Preferred Index Types for Text Search
Commercial use is possible only with written permission from the company LLC “Tantor Labs”
© LLC "Tantor Labs"
  • Technical Support Policy
  • Cookie Processing Policy
  • Website User Agreement
Products
  • Tantor DBMS
  • Tantor Platform
  • Documentation
Contacts
  • +7 495 369-48-16
  • info@tantorlabs.ru
16.8.1 - 8e8befb1 - 2025-04-10 15:48:47