The fuzzystrmatch extension in PostgreSQL provides functions for fuzzy string matching. It is useful for approximate string comparisons, spell-checking, and searching similar words in a database. Your Nile database arrives with fuzzystrmatch extension already enabled.

Available Functions

The fuzzystrmatch extension provides several functions for different types of string matching algorithms:

Soundex

The soundex() function returns a four-character Soundex code based on how a word sounds:

SELECT soundex('example');

Difference

The difference() function compares two Soundex codes and returns a similarity score from 0 to 4 (higher means more similar):

SELECT difference('example', 'exampel');

Levenshtein Distance

The levenshtein() function computes the edit distance (number of single-character edits required to transform one string into another):

SELECT levenshtein('example', 'exampel');

Levenshtein Distance with Custom Costs

You can specify different costs for insertions, deletions, and substitutions:

SELECT levenshtein('example', 'exampel', 1, 2, 1);

Metaphone

The metaphone() function returns a phonetic representation of a word, useful for English-language fuzzy searches:

SELECT metaphone('example', 10);

Example: Finding Similar Names

If you have a table with names and want to find names similar to a given input:

CREATE TABLE contacts (
    tenant_id UUID,
    name TEXT,
    email EMAILADDR,
    PRIMARY KEY (tenant_id, email)
);

-- Create a tenant first
INSERT INTO tenants (id, name) VALUES 
    ('11111111-1111-1111-1111-111111111111', 'Example Corp');

INSERT INTO contacts (tenant_id, name, email) VALUES
    ('11111111-1111-1111-1111-111111111111', 'John', 'john@example.com'),
    ('11111111-1111-1111-1111-111111111111', 'Jon', 'jon@example.com'),
    ('11111111-1111-1111-1111-111111111111', 'Johnny', 'johnny@example.com'),
    ('11111111-1111-1111-1111-111111111111', 'Jonathan', 'jonathan@example.com');
    

SELECT name FROM contacts WHERE difference(name, 'Jon') > 2;

Use Cases

  • Finding similar names in a customer database.
  • Detecting typos in text input.
  • Enhancing search functionality with approximate string matching.

Limitations

  • Soundex is optimized for English and may not work well for other languages.
  • Levenshtein distance can be computationally expensive for large datasets.
  • Phonetic matching may not always align perfectly with intended pronunciations.

Conclusion

The fuzzystrmatch extension provides multiple methods for fuzzy string matching, making it a valuable tool for approximate searches and typo detection in PostgreSQL databases.

For more details, refer to the PostgreSQL documentation.

Was this page helpful?