LlamaIndex

LlamaIndex is a framework for building context-augmented generative AI applications with LLMs. It provides a wide range of functionality including data connectors, index building, query engines, agents, workflows and observability. Making it easy to build powerful RAG applications.

Using LlamaIndex and Nile together

LlamaIndex can be used with Nile to build RAG (Retrieval Augmented Generation) architectures. You'll use LLamaIndex to simply and orchestrate the different steps in your RAG workflows, and Nile to store and query data and embeddings.

In this example, we'll show how to chat with a sales transcript in just a few lines of code, using LlamaIndex's high-level interface and its integration with Nile and OpenAI.

We'll walk you through the setup steps and then explain the code line by line. The entire Python script is available here or in iPython/Jupyter Notebook.

Setting Up Nile

Start by signing up for Nile. Once you've signed up for Nile, you'll be promoted to create your first database. Go ahead and do so. You'll be redirected to the "Query Editor" page of your new database. You can see the built-in tenants table on the left-hand side.

From there, click on "Home" (top icon on the left menu), click on "generate credentials" and copy the resulting connection string. You will need it in a sec.

Setting Up LlamaIndex

LlamaIndex is a Python library, so you'll need to set up a Python environment with the necessary dependencies. We recommend using venv to create a virtual environment. This step is optional, but it will help you manage your dependencies and avoid conflicts.

python3 -m venv llama-env
source llama-env/bin/activate

Once you've activated your virtual environment, you can install the necessary dependencies - LlamaIndex and the Nile Vector Store:

pip install llama-index llama-index-vector-stores-nile

Setting up the data

In this example, we'll chat with sales transcript of two different companies. Download the transcripts to ./data directory.

mkdir -p data/
wget "https://raw.githubusercontent.com/niledatabase/niledatabase/main/examples/ai/sales_insight/data/transcripts/nexiv-solutions__0_transcript.txt" -O "data/nexiv-solutions__0_transcript.txt"
wget "https://raw.githubusercontent.com/niledatabase/niledatabase/main/examples/ai/sales_insight/data/transcripts/modamart__0_transcript.txt" -O "data/modamart__0_transcript.txt"

Setting up the OpenAI API key

This quickstart uses OpenAI's API to generate embeddings. So grab your OpenAI API key and set it as an environment variable:

export OPENAI_API_KEY="your-openai-api-key"

Quickstart

Open a file named nile_llamaindex_quickstart.py and start by importing the necessary dependencies (or follow along with the script mentioned above):

import logging

logging.basicConfig(level=logging.INFO)

from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)
from llama_index.vector_stores.nile import NileVectorStore, IndexType

Setting up the NileVectorStore

Next, create a NileVectorStore instance:

vector_store = NileVectorStore(
    service_url="postgresql://user:password@us-west-2.db.thenile.dev:5432/niledb",
    table_name="test_table",
    tenant_aware=True,
    num_dimensions=1536,
)

Note that in addition to the usual parameters like URL and dimensions, we also set tenant_aware=True. This is because we want to isolate the documents for each tenant in our vector store.

🔥 NileVectorStore supports both tenant-aware vector stores, that isolates the documents for each tenant and a regular store which is typically used for shared data that all tenants can access. Below, we'll demonstrate the tenant-aware vector store.

Loading and transforming the data

With all this in place, we'll load the data for the sales transcripts. We'll use LlamaIndex's SimpleDirectoryReader to load the documents. Because we want to update the documents with the tenant metadata after loading, we'll use a separate reader for each tenant.

reader = SimpleDirectoryReader(input_files=["nexiv-solutions__0_transcript.txt"])
documents_nexiv = reader.load_data()

reader = SimpleDirectoryReader(input_files=["modamart__0_transcript.txt"])
documents_modamart = reader.load_data()

We are going to create two Nile tenants and the add the tenant ID of each to the document metadata. We are also adding some additional metadata like a custom document ID and a category. This metadata can be used for filtering documents during the retrieval process. Of course, in your own application, you could also load documents for existing tenants and add any metadata information you find useful.

tenant_id_nexiv = str(vector_store.create_tenant("nexiv-solutions"))
tenant_id_modamart = str(vector_store.create_tenant("modamart"))

# Add the tenant id to the metadata
for i, doc in enumerate(documents_nexiv, start=1):
    doc.metadata["tenant_id"] = tenant_id_nexiv
    doc.metadata[
        "category"
    ] = "IT"  # We will use this to apply additional filters in a later example
    doc.id_ = f"nexiv_doc_id_{i}"  # We are also setting a custom id, this is optional but can be useful

for i, doc in enumerate(documents_modamart, start=1):
    doc.metadata["tenant_id"] = tenant_id_modamart
    doc.metadata["category"] = "Retail"
    doc.id_ = f"modamart_doc_id_{i}"

We are loading all documents to the same VectorStoreIndex. Since we created a tenant-aware NileVectorStore when we set things up, Nile will correctly use the tenant_id field in the metadata to isolate them. Loading documents without tenant_id to a tenant-aware store will throw a ValueException.

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents_nexiv + documents_modamart,
    storage_context=storage_context,
    show_progress=True,
)

Chatting with the documents

Now that we have our vector embeddings stored in Nile, we can build a query engine for each tenant and chat with the documents:

nexiv_query_engine = index.as_query_engine(
    similarity_top_k=3,
    vector_store_kwargs={
        "tenant_id": str(tenant_id_nexiv),
    },
)

print(nexiv_query_engine.query("What were the customer pain points?"))

modamart_query_engine = index.as_query_engine(
    similarity_top_k=3,
    vector_store_kwargs={
        "tenant_id": str(tenant_id_modamart),
    },
)

print(modamart_query_engine.query("What were the customer pain points?"))

And run the script:

python nile_llamaindex_quickstart.py

Nexiv is an IT company and Modamart is a retail company. You can see that the query engine for each tenant returns an answer relevant to the tenant.

Thats it! You've now built a (small)RAG application with LlamaIndex and Nile.

The Python script and iPython/Jupyter Notebook include all the code for this quickstart, as well as few additional examples - for example, how to use metadata filters to farther restrict the search results, or how to delete documents from the vector store.

Full application

Ready to build something amazing? Check out our TaskGenius example.

The README includes step by step instructions on how to run the application locally.

Lets go over a few of the code highlights:

Use of LlamaIndex with FastAPI

The example is a full-stack application with a FastAPI back-end and a React front-end.

When we initialize FastAPI, we create an instance of our AIUtils class, which is responsible for interfacing with the Nile vector store and the OpenAI API.

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialize AIUtils - it is a singleton, so we are just saving on initialization time
    AIUtils()
    yield
    # Cleanup
    ai_utils = None

app = FastAPI(lifespan=lifespan)

Then, in the handler for the POST /api/todos endpoint, we use the AIUtils instance to generate a time estimate for the todo item, and to store the embedding in the Nile vector store.

@app.post("/api/todos")
async def create_todo(todo: Todo, request: Request, session = Depends(get_tenant_session)):
    ai_utils = AIUtils() # get an instance of AIUtils
    todo.tenant_id = get_tenant_id();
    todo.id = str(uuid4())
    estimate = ai_utils.ai_estimate(todo.title, todo.tenant_id)
    todo.estimate = estimate
    ai_utils.store_embedding(todo)
    session.add(todo)
    session.commit()
    return todo

Using LlamaIndex with Ollama

If you look at the AIUtils class, you'll see that it is very similar to the simple quickstart example earlier. Except we use Ollama instead of OpenAI.

We initialize the vector store and the index in the __init__ method:

# Initialize settings and vector store once
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
Settings.llm = Ollama(model="llama3.2", request_timeout=360.0)

self.vector_store = NileVectorStore(
    service_url=os.getenv("DATABASE_URL"),
    table_name="todos_embedding",
    tenant_aware=True,
    num_dimensions=768
)
self.index = VectorStoreIndex.from_vector_store(self.vector_store)

Then to store the embedding in the Nile vector store, we do exactly what we did in the quickstart example - enrich the todo item with the tenant ID and insert it into the index:

document = Document(
    text=f"{todo.title} is estimated to take {todo.estimate} to complete",
    id_=str(todo.id),
    metadata={"tenant_id": str(todo.tenant_id)}
)

To get an estimate, we create a query engine for the tenant and use it to query the index, just like we did in the quickstart example:

query_engine = self.index.as_query_engine(vector_store_kwargs={
    "tenant_id": str(tenant_id),
})

response = query_engine.query(
    f'you are an amazing project manager. I need to {text}. How long do you think this will take? '
    f'respond with just the estimate, no yapping.'
)

Using FastAPI with Nile for tenant isolation

If you look at the POST /api/todos handler, you'll see that we get all the todos for a tenant without needing to do any filtering. This is because the get_tenant_session function returns a session for the tenant database:

@app.get("/api/todos")
async def get_todos(session = Depends(get_tenant_session)):
    results = session.exec(select(Todo.id, Todo.tenant_id,Todo.title, Todo.estimate, Todo.complete)).all()
    return results

get_tenant_session is implemented in db.py and it is a wrapper around the get_session function that sets the nile.tenant_id and nile.use_id context.

def get_tenant_session():
    session = Session(bind=engine)
    try:
        tenant_id = get_tenant_id()
        user_id = get_user_id()
        session.execute(text(f"SET LOCAL nile.tenant_id='{tenant_id}';"))
        # This will raise an error if user_id doesn't exist or doesn't have access to the tenant DB.
        session.execute(text(f"SET LOCAL nile.user_id='{user_id}';"))
        yield session

The tenant_id and user_id are set in the context by our custom FastAPI middleware. It extracts the tenant ID from the request headers and user token from the cookies.

You can see it in tenant_middleware.py.

class TenantAwareMiddleware:
    def __init__(self, app):
        self.app = app

    async def __call__(self, scope, receive, send):
        headers = Headers(scope=scope)
        maybe_tenant_id = headers.get("X-Tenant-Id")
        maybe_set_context(maybe_tenant_id, tenant_id)
        request = Request(scope)
        token = request.cookies.get("access_token")
        maybe_user_id = get_user_id_from_valid_token(token)
        maybe_set_context(maybe_user_id, user_id)
        await self.app(scope, receive, send)

Summary

This example shows how to use LlamaIndex with Nile to build a RAG application. It demonstrates how to store and query documents in a tenant-aware vector store, and how to use metadata filters to further restrict the search results. It also shows how to use FastAPI with Nile to build a full-stack application with tenant isolation.