2024-10-15-nile_llamaindex/cover.png
2024-10-15
3 min read
gwenshap profile pic
Gwen Shapira

LlamaIndex + Nile: Build Multi-Tenant RAG Applications with Ease

We are excited to announce that LlamaIndex now integrates with Nile - making it easy to build multi-tenant RAG applications, by using Nile to isolate documents and embeddings for each tenant.

Multi-tenant RAG applications are increasingly popular, since they provide security and privacy while using large language models.

However, managing the underlying Postgres database is not straightforward. DB-per-tenant is expensive and complex to manage, while shared-DB has security and privacy concerns, and also limits the scalability and performance of the RAG application. Nile re-engineered Postgres to deliver the best of all worlds - the isolation of DB-per-tenant, at the cost, efficiency and developer experience of a shared-DB.

Storing millions of vectors in a shared-DB can be slow and require significant resources to index and query. But if you store 1000 tenants in Nile's virtual tenant databases, each with 1000 vectors, this can be quite managable. Especially since you can place larger tenants on their own compute, while smaller tenants can efficiently share compute resources and auto-scale as needed.

LlamaIndex is a powerful tool for building RAG applications, and its integration with Nile makes it easy to build multi-tenant RAG applications with just a few lines of code.

To start using Nile with LlamaIndex, you just need to install LlamaIndex's integration package for Nile vector store:

pip install llama-index llama-index-vector-stores-nile

Initialize the Nile vector store:

vector_store = NileVectorStore(
    service_url="postgresql://user:password@us-west-2.db.thenile.dev:5432/niledb",
    table_name="test_table",
    tenant_aware=True,
    num_dimensions=1536,
)

And you are ready to build your multi-tenant RAG application.

Its just a few steps to load your data and start chatting with it:

  1. Load documents using SimpleDirectoryReader, or one of the many other data connectors.
  2. Enrich the documents and add tenant_id to the metadata.
  3. Create a NileVectorStore instance and use it to create an index.
  4. Create a QueryEngine for each tenant and use it to chat with the documents for that tenant alone.

How it works

When you use NileVectorStore as the vector store for LlamaIndex and configure it with tenant_aware=True, it will automatically extract the tenant_id from the metadata and store the documents and embeddings in a virtual tenant database for that tenant - isolated from other tenants. Documents without a tenant_id will not be rejected with an exception.

In this snippet, we load and store documents from two different tenants:

reader = SimpleDirectoryReader(input_files=["nexiv-solutions__0_transcript.txt"])
documents_nexiv = reader.load_data()

reader = SimpleDirectoryReader(input_files=["modamart__0_transcript.txt"])
documents_modamart = reader.load_data()

tenant_id_nexiv = str(vector_store.create_tenant("nexiv-solutions"))
tenant_id_modamart = str(vector_store.create_tenant("modamart"))

# Add the tenant id to the metadata
for i, doc in enumerate(documents_nexiv, start=1):
    doc.metadata["tenant_id"] = tenant_id_nexiv

for i, doc in enumerate(documents_modamart, start=1):
    doc.metadata["tenant_id"] = tenant_id_modamart

# store data and embeddings
index = VectorStoreIndex.from_documents(
    documents_nexiv + documents_modamart,
    storage_context=StorageContext.from_defaults(vector_store=vector_store),
)

When its time to query the documents, you can use the NileVectorStore instance to create a QueryEngine for each tenant. The QueryEngine will use the tenant_id to query the virtual tenant database for the tenant and respond with an answer that is based only on the documents for that tenant. The snippet below will respond with different customer painpoints, depending on the tenant for whom we ask.

query_engine = index.as_query_engine(
    similarity_top_k=3,
    vector_store_kwargs={
        "tenant_id": str(tenant_id),
    },
)

print(query_engine.query("What were the customer pain points?"))

This way you have full isolation and privacy for each tenant, while using the same vector store and index for all tenants.

Other vector store functionality like filtering on additional metadata fields, deleting and updating documents is also supported.

You can also use NileVectorStore with tenant_aware=False in which case the documents stored will be shared and accessible across all tenants. This is useful if you have a shared dataset that you want to use for RAG applications.

Next steps

Nile's LlamaIndex integration documentation will walk you through a small quickstart example. Or if you prefer, you can get started with LlamaIndex's iPython/Jupyter Notebook.

Or you can just right in to a full-stack application. Check out the TaskGenius example to see a full application that uses LlamaIndex and Nile with a Local LLama model to build a multi-tenant RAG application.