LlamaIndex
LlamaIndex is a framework for building context-augmented generative AI applications with LLMs. It provides a wide range of functionality including data connectors, index building, query engines, agents, workflows and observability. Making it easy to build powerful RAG applications.
Using LlamaIndex and Nile together
LlamaIndex can be used with Nile to build RAG (Retrieval Augmented Generation) architectures. You'll use LLamaIndex to simply and orchestrate the different steps in your RAG workflows, and Nile to store and query data and embeddings.
In this example, we'll show how to chat with a sales transcript in just a few lines of code, using LlamaIndex's high-level interface and its integration with Nile and OpenAI.
We'll walk you through the setup steps and then explain the code line by line. The entire Python script is available here or in iPython/Jupyter Notebook.
Setting Up Nile
Start by signing up for Nile. Once you've signed up for Nile, you'll be promoted to create your
first database. Go ahead and do so. You'll be redirected to the "Query Editor" page of your new database. You can see the built-in tenants
table on the left-hand side.
From there, click on "Home" (top icon on the left menu), click on "generate credentials" and copy the resulting connection string. You will need it in a sec.
Setting Up LlamaIndex
LlamaIndex is a Python library, so you'll need to set up a Python environment with the necessary dependencies. We recommend using venv to create a virtual environment. This step is optional, but it will help you manage your dependencies and avoid conflicts.
python3 -m venv llama-env
source llama-env/bin/activate
Once you've activated your virtual environment, you can install the necessary dependencies - LlamaIndex and the Nile Vector Store:
pip install llama-index llama-index-vector-stores-nile
Setting up the data
In this example, we'll chat with sales transcript of two different companies. Download the transcripts to ./data
directory.
mkdir -p data/
wget "https://raw.githubusercontent.com/niledatabase/niledatabase/main/examples/ai/sales_insight/data/transcripts/nexiv-solutions__0_transcript.txt" -O "data/nexiv-solutions__0_transcript.txt"
wget "https://raw.githubusercontent.com/niledatabase/niledatabase/main/examples/ai/sales_insight/data/transcripts/modamart__0_transcript.txt" -O "data/modamart__0_transcript.txt"
Setting up the OpenAI API key
This quickstart uses OpenAI's API to generate embeddings. So grab your OpenAI API key and set it as an environment variable:
export OPENAI_API_KEY="your-openai-api-key"
Quickstart
Open a file named nile_llamaindex_quickstart.py
and start by importing the necessary dependencies (or follow along with the script mentioned above):
import logging
logging.basicConfig(level=logging.INFO)
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.core.vector_stores import (
MetadataFilter,
MetadataFilters,
FilterOperator,
)
from llama_index.vector_stores.nile import NileVectorStore, IndexType
Setting up the NileVectorStore
Next, create a NileVectorStore instance:
vector_store = NileVectorStore(
service_url="postgresql://user:password@us-west-2.db.thenile.dev:5432/niledb",
table_name="test_table",
tenant_aware=True,
num_dimensions=1536,
)
Note that in addition to the usual parameters like URL and dimensions, we also set tenant_aware=True
. This is because we want to isolate the documents for each tenant in our vector store.
🔥 NileVectorStore supports both tenant-aware vector stores, that isolates the documents for each tenant and a regular store which is typically used for shared data that all tenants can access. Below, we'll demonstrate the tenant-aware vector store.
Loading and transforming the data
With all this in place, we'll load the data for the sales transcripts. We'll use LlamaIndex's SimpleDirectoryReader
to load the documents. Because we want to update the documents with the tenant
metadata after loading, we'll use a separate reader for each tenant.
reader = SimpleDirectoryReader(input_files=["nexiv-solutions__0_transcript.txt"])
documents_nexiv = reader.load_data()
reader = SimpleDirectoryReader(input_files=["modamart__0_transcript.txt"])
documents_modamart = reader.load_data()
We are going to create two Nile tenants and the add the tenant ID of each to the document metadata. We are also adding some additional metadata like a custom document ID and a category. This metadata can be used for filtering documents during the retrieval process. Of course, in your own application, you could also load documents for existing tenants and add any metadata information you find useful.
tenant_id_nexiv = str(vector_store.create_tenant("nexiv-solutions"))
tenant_id_modamart = str(vector_store.create_tenant("modamart"))
# Add the tenant id to the metadata
for i, doc in enumerate(documents_nexiv, start=1):
doc.metadata["tenant_id"] = tenant_id_nexiv
doc.metadata[
"category"
] = "IT" # We will use this to apply additional filters in a later example
doc.id_ = f"nexiv_doc_id_{i}" # We are also setting a custom id, this is optional but can be useful
for i, doc in enumerate(documents_modamart, start=1):
doc.metadata["tenant_id"] = tenant_id_modamart
doc.metadata["category"] = "Retail"
doc.id_ = f"modamart_doc_id_{i}"
We are loading all documents to the same VectorStoreIndex
. Since we created a tenant-aware NileVectorStore
when we set things up,
Nile will correctly use the tenant_id
field in the metadata to isolate them. Loading documents without tenant_id
to a tenant-aware store will throw a ValueException
.
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents_nexiv + documents_modamart,
storage_context=storage_context,
show_progress=True,
)
Chatting with the documents
Now that we have our vector embeddings stored in Nile, we can build a query engine for each tenant and chat with the documents:
nexiv_query_engine = index.as_query_engine(
similarity_top_k=3,
vector_store_kwargs={
"tenant_id": str(tenant_id_nexiv),
},
)
print(nexiv_query_engine.query("What were the customer pain points?"))
modamart_query_engine = index.as_query_engine(
similarity_top_k=3,
vector_store_kwargs={
"tenant_id": str(tenant_id_modamart),
},
)
print(modamart_query_engine.query("What were the customer pain points?"))
And run the script:
python nile_llamaindex_quickstart.py
Nexiv is an IT company and Modamart is a retail company. You can see that the query engine for each tenant returns an answer relevant to the tenant.
Thats it! You've now built a (small)RAG application with LlamaIndex and Nile.
The Python script and iPython/Jupyter Notebook include all the code for this quickstart, as well as few additional examples - for example, how to use metadata filters to farther restrict the search results, or how to delete documents from the vector store.
Full application
Ready to build something amazing? Check out our TaskGenius example.
The README includes step by step instructions on how to run the application locally.
Lets go over a few of the code highlights:
Use of LlamaIndex with FastAPI
The example is a full-stack application with a FastAPI back-end and a React front-end.
When we initialize FastAPI, we create an instance of our AIUtils
class, which is responsible for interfacing with the Nile vector store and the OpenAI API.
@asynccontextmanager
async def lifespan(app: FastAPI):
# Initialize AIUtils - it is a singleton, so we are just saving on initialization time
AIUtils()
yield
# Cleanup
ai_utils = None
app = FastAPI(lifespan=lifespan)
Then, in the handler for the POST /api/todos
endpoint, we use the AIUtils
instance to generate a time estimate for the todo item, and to store the embedding in the Nile vector store.
@app.post("/api/todos")
async def create_todo(todo: Todo, request: Request, session = Depends(get_tenant_session)):
ai_utils = AIUtils() # get an instance of AIUtils
todo.tenant_id = get_tenant_id();
todo.id = str(uuid4())
estimate = ai_utils.ai_estimate(todo.title, todo.tenant_id)
todo.estimate = estimate
ai_utils.store_embedding(todo)
session.add(todo)
session.commit()
return todo
Using LlamaIndex with Ollama
If you look at the AIUtils
class, you'll see that it is very similar to the simple quickstart example earlier. Except we use Ollama instead of OpenAI.
We initialize the vector store and the index in the __init__
method:
# Initialize settings and vector store once
Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text")
Settings.llm = Ollama(model="llama3.2", request_timeout=360.0)
self.vector_store = NileVectorStore(
service_url=os.getenv("DATABASE_URL"),
table_name="todos_embedding",
tenant_aware=True,
num_dimensions=768
)
self.index = VectorStoreIndex.from_vector_store(self.vector_store)
Then to store the embedding in the Nile vector store, we do exactly what we did in the quickstart example - enrich the todo item with the tenant ID and insert it into the index:
document = Document(
text=f"{todo.title} is estimated to take {todo.estimate} to complete",
id_=str(todo.id),
metadata={"tenant_id": str(todo.tenant_id)}
)
To get an estimate, we create a query engine for the tenant and use it to query the index, just like we did in the quickstart example:
query_engine = self.index.as_query_engine(vector_store_kwargs={
"tenant_id": str(tenant_id),
})
response = query_engine.query(
f'you are an amazing project manager. I need to {text}. How long do you think this will take? '
f'respond with just the estimate, no yapping.'
)
Using FastAPI with Nile for tenant isolation
If you look at the POST /api/todos
handler, you'll see that we get all the todos for a tenant without needing to do any filtering.
This is because the get_tenant_session
function returns a session for the tenant database:
@app.get("/api/todos")
async def get_todos(session = Depends(get_tenant_session)):
results = session.exec(select(Todo.id, Todo.tenant_id,Todo.title, Todo.estimate, Todo.complete)).all()
return results
get_tenant_session
is implemented in db.py
and it is a wrapper around the get_session
function that sets the nile.tenant_id
and nile.use_id
context.
def get_tenant_session():
session = Session(bind=engine)
try:
tenant_id = get_tenant_id()
user_id = get_user_id()
session.execute(text(f"SET LOCAL nile.tenant_id='{tenant_id}';"))
# This will raise an error if user_id doesn't exist or doesn't have access to the tenant DB.
session.execute(text(f"SET LOCAL nile.user_id='{user_id}';"))
yield session
The tenant_id and user_id are set in the context by our custom FastAPI middleware. It extracts the tenant ID from the request headers and user token from the cookies.
You can see it in tenant_middleware.py
.
class TenantAwareMiddleware:
def __init__(self, app):
self.app = app
async def __call__(self, scope, receive, send):
headers = Headers(scope=scope)
maybe_tenant_id = headers.get("X-Tenant-Id")
maybe_set_context(maybe_tenant_id, tenant_id)
request = Request(scope)
token = request.cookies.get("access_token")
maybe_user_id = get_user_id_from_valid_token(token)
maybe_set_context(maybe_user_id, user_id)
await self.app(scope, receive, send)
Summary
This example shows how to use LlamaIndex with Nile to build a RAG application. It demonstrates how to store and query documents in a tenant-aware vector store, and how to use metadata filters to further restrict the search results. It also shows how to use FastAPI with Nile to build a full-stack application with tenant isolation.