> ## Documentation Index
> Fetch the complete documentation index at: https://thenile.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# RDKit

> RDKit extension for chemical structure handling in PostgreSQL

RDKit is a powerful open-source cheminformatics and machine learning toolkit that provides PostgreSQL
with the ability to handle and analyze chemical structures within the database.
Your Nile database arrives with the RDKit extension already enabled.

## Overview

The RDKit PostgreSQL extension adds support for:

* Chemical structure storage and retrieval
* Substructure and similarity searching
* Chemical structure manipulation
* Molecular descriptor calculation
* Chemical reaction handling

RDKit provides several custom data types:

* `mol` - Represents a molecule (constracted from SMILES notation)
* `qmol` - Represents a molecule containing query information (constructed from SMARTS notation)
* `sfp` - Represents a sparse vector fingerprint
* `bfp` - Represents a bit vectorfingerprint

## Basic Operations

1. **Creating Molecules**

```sql theme={null}
-- Create a molecule from SMILES notation
SELECT mol_from_smiles('CCO') AS ethanol;

-- Create a molecule from SMARTS pattern
SELECT qmol_from_smarts('[OH]') AS hydroxyl;
```

2. **Molecular Properties**

```sql theme={null}
-- Calculate molecular weight
SELECT mol_amw(mol_from_smiles('CCO')) AS molecular_weight;

-- Count atoms
SELECT mol_numatoms(mol_from_smiles('CCO')) AS atom_count;
```

3. **Substructure Searching**

```sql theme={null}
-- Check if a molecule contains a substructure
SELECT mol_from_smiles('CCO') @> qmol_from_smarts('[OH]') AS has_hydroxyl;
```

### Similarity Searching

RDKit supports various similarity metrics:

```sql theme={null}
-- Calculate Tanimoto similarity between molecules
SELECT tanimoto_sml(
    morganbv_fp(mol_from_smiles('CCO')),
    morganbv_fp(mol_from_smiles('CCN'))
) AS similarity;
```

## Use Cases

RDKit is particularly useful for:

* Drug discovery and development
* Chemical database management
* Structure-activity relationship analysis
* Chemical similarity searching
* Reaction prediction and analysis

## Performance Optimization

For better performance when working with large chemical databases:

1. Create indexes on chemical structure columns:

```sql theme={null}
CREATE INDEX idx_molecule_substructure ON your_table USING gist(molecule);
```

2. Use appropriate fingerprint types for your specific use case:

* Morgan fingerprints for general similarity searching
* MACCS keys for substructure screening
* Topological fingerprints for specific pattern matching

## Additional Resources

* [RDKit Documentation](https://www.rdkit.org/docs/)
* [RDKit PostgreSQL Cartridge](https://www.rdkit.org/docs/Cartridge.html)
* [Chemical Development Kit](https://cdk.github.io/)