Search and Indexing
pg_bigm
Full-text search using bigrams
The pg_bigm
extension provides fast full-text search functionality using 2-gram (bigram) matching in PostgreSQL databases. It’s particularly useful when you need to perform similarity searches or fuzzy string matching on large text data.
Your Nile database arrives with the pg_bigm
extension already enabled.
Understanding pg_bigm
A bigram is a pair of consecutive characters in a string. For example, the word “hello” contains the following bigrams: “he”, “el”, “ll”, “lo”. pg_bigm creates an index of these bigrams, enabling fast similarity searches and partial matching queries.
Key Features
- Fast Full-Text Search: Efficient searching using bigram matching
- Similarity Calculation: Built-in functions to measure string similarity
- Partial Matching: Find strings containing specific patterns
- Language Agnostic: Works well with any language, including non-Latin scripts
- GIN Index Support: Fast search performance using GIN indexes
Usage Examples
Creating a Table with Text Search
Creating a GIN Index
pg_bigm
supports full-text search indexes:
- gin must be used as an index method. GiST is not available for pg_bigm.
- gin_bigm_ops must be used as an operator class.
Inserting Sample Data
Search Examples
Simple partial matching:
Using similarity search:
Finding similar strings:
Functions and Operators
Main Functions
similarity(text, text)
: Returns similarity between two strings (0.0 to 1.0)show_bigm(text)
: Shows all bigrams in a stringbigm_similarity_threshold
: Sets threshold for%
operator (default: 0.3)
Operators
LIKE
: Standard pattern matching%
: Similarity search operator=~
: Regular expression match with bigram index support
Configuration Parameters
pg_bigm.similarity_threshold
: Default similarity threshold (0.0 to 1.0)pg_bigm.enable_recheck
: Whether to recheck similarity in search resultspg_bigm.gin_key_limit
: Maximum number of bigrams for GIN index
Use Cases
- Fuzzy text search
- Spell-check functionality
- Similar content matching
- Auto-complete suggestions
- Typo-tolerant search
- Multi-language text search
Performance Considerations
- GIN indexes can be large, plan storage accordingly
- Index creation might be slow for large tables
- Index only necessary columns
- Monitor index size and search performance
- Adjust similarity threshold to balance precision and recall
- Consider using
pg_bigm.enable_recheck
for better accuracy
Limitations
- Indexes can be larger compared to traditional B-tree indexes
- Not suitable for exact matching (use standard indexes instead)
- May require more memory during search operations
- Performance depends on similarity threshold and data size
Additional Resources
Was this page helpful?