Data Types and Storage
RDKit
RDKit extension for chemical structure handling in PostgreSQL
RDKit is a powerful open-source cheminformatics and machine learning toolkit that provides PostgreSQL with the ability to handle and analyze chemical structures within the database. Your Nile database arrives with the RDKit extension already enabled.
Overview
The RDKit PostgreSQL extension adds support for:
- Chemical structure storage and retrieval
- Substructure and similarity searching
- Chemical structure manipulation
- Molecular descriptor calculation
- Chemical reaction handling
RDKit provides several custom data types:
mol
- Represents a molecule (constracted from SMILES notation)qmol
- Represents a molecule containing query information (constructed from SMARTS notation)sfp
- Represents a sparse vector fingerprintbfp
- Represents a bit vectorfingerprint
Basic Operations
- Creating Molecules
- Molecular Properties
- Substructure Searching
Similarity Searching
RDKit supports various similarity metrics:
Use Cases
RDKit is particularly useful for:
- Drug discovery and development
- Chemical database management
- Structure-activity relationship analysis
- Chemical similarity searching
- Reaction prediction and analysis
Performance Optimization
For better performance when working with large chemical databases:
- Create indexes on chemical structure columns:
- Use appropriate fingerprint types for your specific use case:
- Morgan fingerprints for general similarity searching
- MACCS keys for substructure screening
- Topological fingerprints for specific pattern matching