pg_trgm

The pg_trgm extension provides trigram matching capabilities for fast text similarity search and fuzzy string matching in PostgreSQL. A trigram is a group of three consecutive characters taken from a string. This extension is particularly useful for implementing features like fuzzy search, spell checking, and finding similar strings. Your Nile database arrives with the pg_trgm extension already enabled.

Understanding Trigrams

A trigram is created by taking three consecutive characters from a text string. For example, the word “hello” contains these trigrams: ” h” (with two spaces), “he”, “hel”, “ell”, “llo”, “lo ” (with two spaces). The spaces at the beginning and end are important for matching word boundaries.

Similarity between strings is calculated by counting how many trigrams they share in common. For example, “hello” and “helo” share most of their trigrams (like ” h”, “hel”, “lo ”), resulting in a high similarity score. This makes trigrams excellent for fuzzy matching and finding similar strings even when they contain typos.

Usage Examples

Let’s explore how to use pg_trgm with a practical example using a products database.

Creating a Table with Text Search

CREATE TABLE products (
  tenant_id uuid,
  id integer,
  name text,
  description text,
  PRIMARY KEY (tenant_id, id)
);

-- Create a GiST index for faster similarity searches
CREATE INDEX trgm_idx_products_name ON products USING gist (name gist_trgm_ops);
-- Or create a GIN index (faster searches, slower updates, more space)
CREATE INDEX trgm_gin_idx_products_name ON products USING gin (name gin_trgm_ops);

Inserting Sample Data

-- Create a tenant first
INSERT INTO tenants (id, name) VALUES
  ('d1c06023-3421-4fbb-9dd1-c96e42d2fd02', 'Tenant 1');

INSERT INTO products (tenant_id, id, name, description) VALUES
  ('d1c06023-3421-4fbb-9dd1-c96e42d2fd02', 1, 'iPhone 13 Pro', 'Latest Apple smartphone'),
  ('d1c06023-3421-4fbb-9dd1-c96e42d2fd02', 2, 'iPhne 13', 'Budget Apple smartphone'),
  ('d1c06023-3421-4fbb-9dd1-c96e42d2fd02', 3, 'Samsung Galaxy S21', 'Android flagship phone');

Basic Similarity Queries

-- Find similarity between two strings (returns a number between 0 and 1)
SELECT similarity('iPhone', 'iPhne');

-- Find all products with names similar to 'iPhone' (overall similarity)
SELECT name, similarity(name, 'iPhone') AS sim
FROM products
WHERE name % 'iPhone'  -- Uses similarity threshold
ORDER BY sim DESC;

-- Find products that contain something similar to 'phone' (substring matching)
SELECT name, similarity(name, 'iPhone') AS sim
FROM products
WHERE name %> 'phone';  -- Better for substring/pattern matching

Show Trigrams

-- Show trigrams in a string
SELECT show_trgm('iPhone');

Setting Similarity Threshold

-- Set the similarity threshold (default is 0.3)
SET pg_trgm.similarity_threshold = 0.3;

-- Query using the new threshold
SELECT name
FROM products
WHERE name % 'iPhone';

Word Similarity

-- Find word similarity (better for matching whole words)
SELECT name, word_similarity('iPhone', name) as sim
FROM products
ORDER BY sim DESC;

-- Using strict word similarity
SELECT name, strict_word_similarity('iPhone', name) as sim
FROM products
ORDER BY sim DESC;

Functions and Operators

Main Functions

similarity(text, text): Returns similarity between two strings (0 to 1)
show_trgm(text): Shows trigrams in a string
word_similarity(text, text): Returns word-based similarity
strict_word_similarity(text, text): Returns strict word-based similarity
show_limit(): Shows current similarity threshold

Operators

%: Returns true if strings are similar (uses similarity threshold)
<%: Returns true if first string is less similar than second
%>: Returns true if first string is more similar than second
<->: Returns distance between strings (1 - similarity)
<<->: Returns word-based distance
<->>: Returns strict word-based distance

Index Types

pg_trgm supports two types of indexes:

GiST Index

CREATE INDEX trgm_gist_idx ON table_name USING gist (column_name gist_trgm_ops);

Balanced performance between search and update
Smaller index size
Good for dynamic data

GIN Index

CREATE INDEX trgm_gin_idx ON table_name USING gin (column_name gin_trgm_ops);

Faster searches
Slower updates
Larger index size
Better for static data

Best Practices

Index Selection:
- Use GIN for mostly-read data
- Use GiST for frequently updated data
- Consider creating indexes only on frequently searched columns
Threshold Tuning:
- Lower threshold (e.g., 0.2) for more matches
- Higher threshold (e.g., 0.5) for stricter matching
- Test with your data to find the optimal value
Performance Optimization:
- Use word_similarity() for whole word matching
- Create indexes on specific columns rather than all text columns
- Monitor index size and rebuild when necessary

Common Use Cases

Fuzzy search functionality
Spell-check suggestions
Auto-complete features
Finding similar product names
Matching addresses with typos
Search with tolerance for misspellings

Get Started

Tenant Virtualization

Postgres Concepts

Extensions

Understanding Trigrams

Usage Examples

Creating a Table with Text Search

Inserting Sample Data

Basic Similarity Queries

Show Trigrams

Setting Similarity Threshold

Word Similarity

Functions and Operators

Main Functions

Operators

Index Types

GiST Index

GIN Index

Best Practices

Common Use Cases

Performance Considerations

Limitations

Additional Resources

Get Started

Tenant Virtualization

Postgres Concepts

Extensions

​Understanding Trigrams

​Usage Examples

​Creating a Table with Text Search

​Inserting Sample Data

​Basic Similarity Queries

​Show Trigrams

​Setting Similarity Threshold

​Word Similarity

​Functions and Operators

​Main Functions

​Operators

​Index Types

​GiST Index

​GIN Index

​Best Practices

​Common Use Cases

​Performance Considerations

​Limitations

​Additional Resources

Understanding Trigrams

Usage Examples

Creating a Table with Text Search

Inserting Sample Data

Basic Similarity Queries

Show Trigrams

Setting Similarity Threshold

Word Similarity

Functions and Operators

Main Functions

Operators

Index Types

GiST Index

GIN Index

Best Practices

Common Use Cases

Performance Considerations

Limitations

Additional Resources