RDKit is a powerful open-source cheminformatics and machine learning toolkit that provides PostgreSQL with the ability to handle and analyze chemical structures within the database. Your Nile database arrives with the RDKit extension already enabled.

Overview

The RDKit PostgreSQL extension adds support for:

  • Chemical structure storage and retrieval
  • Substructure and similarity searching
  • Chemical structure manipulation
  • Molecular descriptor calculation
  • Chemical reaction handling

RDKit provides several custom data types:

  • mol - Represents a molecule (constracted from SMILES notation)
  • qmol - Represents a molecule containing query information (constructed from SMARTS notation)
  • sfp - Represents a sparse vector fingerprint
  • bfp - Represents a bit vectorfingerprint

Basic Operations

  1. Creating Molecules
-- Create a molecule from SMILES notation
SELECT mol_from_smiles('CCO') AS ethanol;

-- Create a molecule from SMARTS pattern
SELECT qmol_from_smarts('[OH]') AS hydroxyl;
  1. Molecular Properties
-- Calculate molecular weight
SELECT mol_amw(mol_from_smiles('CCO')) AS molecular_weight;

-- Count atoms
SELECT mol_numatoms(mol_from_smiles('CCO')) AS atom_count;
  1. Substructure Searching
-- Check if a molecule contains a substructure
SELECT mol_from_smiles('CCO') @> qmol_from_smarts('[OH]') AS has_hydroxyl;

Similarity Searching

RDKit supports various similarity metrics:

-- Calculate Tanimoto similarity between molecules
SELECT tanimoto_sml(
    morganbv_fp(mol_from_smiles('CCO')),
    morganbv_fp(mol_from_smiles('CCN'))
) AS similarity;

Use Cases

RDKit is particularly useful for:

  • Drug discovery and development
  • Chemical database management
  • Structure-activity relationship analysis
  • Chemical similarity searching
  • Reaction prediction and analysis

Performance Optimization

For better performance when working with large chemical databases:

  1. Create indexes on chemical structure columns:
CREATE INDEX idx_molecule_substructure ON your_table USING gist(molecule);
  1. Use appropriate fingerprint types for your specific use case:
  • Morgan fingerprints for general similarity searching
  • MACCS keys for substructure screening
  • Topological fingerprints for specific pattern matching

Additional Resources

Was this page helpful?