Skip to main content
Ctrl+K

документация Tantor Special Edition 1C 15.12

Оглавление:

  • Preface
    • 1.  What Is Tantor SE-1С?
    • 2. A Brief History of PostgreSQL
    • 3. What are the differences between Tantor SE 1C 15 and PostgreSQL 15
    • 4. 1C improvements
    • 5. Conventions
    • 6. Bug Reporting Guidelines
  • Часть I. Tutorial
    • Глава 1. Synopsis
    • Глава 2. The SQL Language
    • Глава 3. Advanced Features
  • Часть II. The SQL Language
    • Глава 4. SQL Syntax
    • Глава 5. Data Definition
    • Глава 6. Data Manipulation
    • Глава 7. Queries
    • Глава 8. Data Types
    • Глава 9. Functions and Operators
    • Глава 10. Type Conversion
    • Глава 11. Indexes
    • Глава 12. Full Text Search
    • Глава 13. Concurrency Control
    • Глава 14. Performance Tips
    • Глава 15. Parallel Query
  • Часть III. Server Administration
    • Глава 16. Installation from Binaries
    • Глава 17. Server Setup and Operation
    • Глава 18. Server Configuration
    • Глава 19. Client Authentication
    • Глава 20. Database Roles
    • Глава 21. Managing Databases
    • Глава 22. Localization
    • Глава 23. Routine Database Maintenance Tasks
    • Глава 24. Backup and Restore
    • Глава 25. High Availability, Load Balancing, and Replication
    • Глава 26. Monitoring Database Activity
    • Глава 27. Monitoring Disk Usage
    • Глава 28. Reliability and the Write-Ahead Log
    • Глава 29. Logical Replication
    • Глава 30. Just-in-Time Compilation (JIT)
  • Часть IV. Client Interfaces
    • Глава 31. libpq — C Library
    • Глава 32. Large Objects
    • Глава 33. ECPG — Embedded SQL in C
    • Глава 34. The Information Schema
  • Часть V. Server Programming
    • Глава 35. Extending SQL
    • Глава 36. Triggers
    • Глава 37. Event Triggers
    • Глава 38. The Rule System
    • Глава 39. Procedural Languages
    • Глава 40. PL/pgSQL — SQL Procedural Language
    • Глава 41. PL/Tcl — Tcl Procedural Language
    • Глава 42. PL/Perl — Perl Procedural Language
    • Глава 43. PL/Python — Python Procedural Language
    • Глава 44. Server Programming Interface
    • Глава 45. Background Worker Processes
    • Глава 46. Logical Decoding
    • Глава 47. Replication Progress Tracking
    • Глава 48. Archive Modules
  • Часть VI. Reference
    • SQL Commands
    • PostgreSQL Client Applications
    • PostgreSQL Server Applications
  • Часть VII. Internals
    • Глава 49. Overview of PostgreSQL Internals
    • Глава 50. System Catalogs
    • Глава 51. System Views
    • Глава 52. Frontend/Backend Protocol
    • Глава 53. PostgreSQL Coding Conventions
    • Глава 54. Native Language Support
    • Глава 55. Writing a Procedural Language Handler
    • Глава 56. Writing a Foreign Data Wrapper
    • Глава 57. Writing a Table Sampling Method
    • Глава 58. Writing a Custom Scan Provider
    • Глава 59. Genetic Query Optimizer
    • Глава 60. Table Access Method Interface Definition
    • Глава 61. Index Access Method Interface Definition
    • Глава 62. Generic WAL Records
    • Глава 63. Custom WAL Resource Managers
    • Глава 64. B-Tree Indexes
    • Глава 65. GiST Indexes
    • Глава 66. SP-GiST Indexes
    • Глава 67. GIN Indexes
    • Глава 68. BRIN Indexes
    • Глава 69. Hash Indexes
    • Глава 70. Database Physical Storage
    • Глава 71. System Catalog Declarations and Initial Contents
    • Глава 72. How the Planner Uses Statistics
    • Глава 73. Backup Manifest Format
  • Часть VIII. Appendixes
    • Предметный указатель A. Tantor SE-1С Error Codes
    • Предметный указатель B. Date/Time Support
    • Предметный указатель C. SQL Key Words
    • Предметный указатель D. SQL Conformance
    • Предметный указатель E. Release Notes
    • Предметный указатель F. Additional Supplied Modules
    • Предметный указатель G. Additional Supplied Programs
    • Предметный указатель H. Additional External Modules
    • Предметный указатель I. External Projects
    • Предметный указатель J. Tantor SE-1С Limits
    • Предметный указатель K. Acronyms
    • Предметный указатель L. Glossary
    • Предметный указатель M. Color Support
    • Предметный указатель N. Obsolete or Renamed Features
  • Bibliography
  • Предметный указатель

12.8. Testing and Debugging Text Search

12.8. Testing and Debugging Text Search#

12.8. Testing and Debugging Text Search
12.8. Testing and Debugging Text Search
Назад НаверхГлава 12. Full Text SearchНачало Далее

12.8. Testing and Debugging Text Search

12.8.1. Configuration Testing
12.8.2. Parser Testing
12.8.3. Dictionary Testing

The behavior of a custom text search configuration can easily become confusing. The functions described in this section are useful for testing text search objects. You can test a complete configuration, or test parsers and dictionaries separately.

12.8.1. Configuration Testing

The function ts_debug allows easy testing of a text search configuration.

ts_debug([ config regconfig, ] document text,
         OUT alias text,
         OUT description text,
         OUT token text,
         OUT dictionaries regdictionary[],
         OUT dictionary regdictionary,
         OUT lexemes text[])
         returns setof record

ts_debug displays information about every token of document as produced by the parser and processed by the configured dictionaries. It uses the configuration specified by config, or default_text_search_config if that argument is omitted.

ts_debug returns one row for each token identified in the text by the parser. The columns returned are

  • alias text — short name of the token type

  • description text — description of the token type

  • token text — text of the token

  • dictionaries regdictionary[] — the dictionaries selected by the configuration for this token type

  • dictionary regdictionary — the dictionary that recognized the token, or NULL if none did

  • lexemes text[] — the lexeme(s) produced by the dictionary that recognized the token, or NULL if none did; an empty array ({}) means it was recognized as a stop word

Here is a simple example:

SELECT * FROM ts_debug('english', 'a fat  cat sat on a mat - it ate a fat rats');
   alias   |   description   | token |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+-------+----------------+--------------+---------
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | fat   | {english_stem} | english_stem | {fat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | cat   | {english_stem} | english_stem | {cat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | sat   | {english_stem} | english_stem | {sat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | on    | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | mat   | {english_stem} | english_stem | {mat}
 blank     | Space symbols   |       | {}             |              |
 blank     | Space symbols   | -     | {}             |              |
 asciiword | Word, all ASCII | it    | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | ate   | {english_stem} | english_stem | {ate}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | a     | {english_stem} | english_stem | {}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | fat   | {english_stem} | english_stem | {fat}
 blank     | Space symbols   |       | {}             |              |
 asciiword | Word, all ASCII | rats  | {english_stem} | english_stem | {rat}

For a more extensive demonstration, we first create a public.english configuration and Ispell dictionary for the English language:

CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );

CREATE TEXT SEARCH DICTIONARY english_ispell (
    TEMPLATE = ispell,
    DictFile = english,
    AffFile = english,
    StopWords = english
);

ALTER TEXT SEARCH CONFIGURATION public.english
   ALTER MAPPING FOR asciiword WITH english_ispell, english_stem;
SELECT * FROM ts_debug('public.english', 'The Brightest supernovaes');
   alias   |   description   |    token    |         dictionaries          |   dictionary   |   lexemes
-----------+-----------------+-------------+-------------------------------+----------------+-------------
 asciiword | Word, all ASCII | The         | {english_ispell,english_stem} | english_ispell | {}
 blank     | Space symbols   |             | {}                            |                |
 asciiword | Word, all ASCII | Brightest   | {english_ispell,english_stem} | english_ispell | {bright}
 blank     | Space symbols   |             | {}                            |                |
 asciiword | Word, all ASCII | supernovaes | {english_ispell,english_stem} | english_stem   | {supernova}

In this example, the word Brightest was recognized by the parser as an ASCII word (alias asciiword). For this token type the dictionary list is english_ispell and english_stem. The word was recognized by english_ispell, which reduced it to the noun bright. The word supernovaes is unknown to the english_ispell dictionary so it was passed to the next dictionary, and, fortunately, was recognized (in fact, english_stem is a Snowball dictionary which recognizes everything; that is why it was placed at the end of the dictionary list).

The word The was recognized by the english_ispell dictionary as a stop word (Раздел 12.6.1) and will not be indexed. The spaces are discarded too, since the configuration provides no dictionaries at all for them.

You can reduce the width of the output by explicitly specifying which columns you want to see:

SELECT alias, token, dictionary, lexemes
FROM ts_debug('public.english', 'The Brightest supernovaes');
   alias   |    token    |   dictionary   |   lexemes
-----------+-------------+----------------+-------------
 asciiword | The         | english_ispell | {}
 blank     |             |                |
 asciiword | Brightest   | english_ispell | {bright}
 blank     |             |                |
 asciiword | supernovaes | english_stem   | {supernova}

12.8.2. Parser Testing

The following functions allow direct testing of a text search parser.

ts_parse(parser_name text, document text,
         OUT tokid integer, OUT token text) returns setof record
ts_parse(parser_oid oid, document text,
         OUT tokid integer, OUT token text) returns setof record

ts_parse parses the given document and returns a series of records, one for each token produced by parsing. Each record includes a tokid showing the assigned token type and a token which is the text of the token. For example:

SELECT * FROM ts_parse('default', '123 - a number');
 tokid | token
-------+--------
    22 | 123
    12 |
    12 | -
     1 | a
    12 |
     1 | number

ts_token_type(parser_name text, OUT tokid integer,
              OUT alias text, OUT description text) returns setof record
ts_token_type(parser_oid oid, OUT tokid integer,
              OUT alias text, OUT description text) returns setof record

ts_token_type returns a table which describes each type of token the specified parser can recognize. For each token type, the table gives the integer tokid that the parser uses to label a token of that type, the alias that names the token type in configuration commands, and a short description. For example:

SELECT * FROM ts_token_type('default');
 tokid |      alias      |               description
-------+-----------------+------------------------------------------
     1 | asciiword       | Word, all ASCII
     2 | word            | Word, all letters
     3 | numword         | Word, letters and digits
     4 | email           | Email address
     5 | url             | URL
     6 | host            | Host
     7 | sfloat          | Scientific notation
     8 | version         | Version number
     9 | hword_numpart   | Hyphenated word part, letters and digits
    10 | hword_part      | Hyphenated word part, all letters
    11 | hword_asciipart | Hyphenated word part, all ASCII
    12 | blank           | Space symbols
    13 | tag             | XML tag
    14 | protocol        | Protocol head
    15 | numhword        | Hyphenated word, letters and digits
    16 | asciihword      | Hyphenated word, all ASCII
    17 | hword           | Hyphenated word, all letters
    18 | url_path        | URL path
    19 | file            | File or path name
    20 | float           | Decimal notation
    21 | int             | Signed integer
    22 | uint            | Unsigned integer
    23 | entity          | XML entity

12.8.3. Dictionary Testing

The ts_lexize function facilitates dictionary testing.

ts_lexize(dict regdictionary, token text) returns text[]

ts_lexize returns an array of lexemes if the input token is known to the dictionary, or an empty array if the token is known to the dictionary but it is a stop word, or NULL if it is an unknown word.

Examples:

SELECT ts_lexize('english_stem', 'stars');
 ts_lexize
-----------
 {star}

SELECT ts_lexize('english_stem', 'a');
 ts_lexize
-----------
 {}

Примечание

The ts_lexize function expects a single token, not text. Here is a case where this can be confusing:

SELECT ts_lexize('thesaurus_astro', 'supernovae stars') is null;
 ?column?
----------
 t

The thesaurus dictionary thesaurus_astro does know the phrase supernovae stars, but ts_lexize fails since it does not parse the input text but treats it as a single token. Use plainto_tsquery or to_tsvector to test thesaurus dictionaries, for example:

SELECT plainto_tsquery('supernovae stars');
 plainto_tsquery
-----------------
 'sn'


Назад Наверх Далее
12.7. Configuration Пример Начало 12.9. Preferred Index Types for Text Search
Коммерческое использование возможно только с письменного разрешения компании ОOO “Лаборатории Тантор”
© ООО "Лаборатории Тантор"
  • Положение о технической поддержке
  • Политика обработки файлов сookie
  • Пользовательское соглашение сайта
Продукты
  • СУБД Tantor
  • Платформа Tantor
  • Документация
Контакты
  • +7 495 369-48-16
  • info@tantorlabs.ru
15.12.0 - 77d56436 - 2025-07-18 15:03:49