KnowledgeAI - PDF search assistant for your organization

knowledgeaisnapshot

In this tutorial, you will learn how to use Nile's tenant virtualization, user management and vector embedding features to build a SaaS application that allows users to search the PDF documents in an organization. The SaaS will allow users to upload a PDF document, and then ask questions about the document. Using embeddings generated by OpebAI and stored in Nile, and a similarity search using pg_vector, to provide GPT-3.5 context that will help answer the questions. The embeddings are stored per tenant and data and workload is isolated to a tenant.

1. Create a database

  1. Sign up for an invite to Nile if you don't have one already
  2. You should see a welcome message. Click on "Lets get started"
  3. Give your workspace and database names, or you can accept the default auto-generated names.

2. Create a table with pg_vector extension

Once you created your database, you'll land in Nile's web-based SQL editor. Great place to create the tables we need for this app. Lets start with the embeddings table.

CREATE TABLE "file_embedding" (
  "id" UUID DEFAULT (gen_random_uuid()),
  "tenant_id" UUID NOT NULL,
  "file_id" UUID NOT NULL,
  "embedding_api_id"  UUID NOT NULL,
  "embedding" vector(1024),
  "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "updatedAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "pageContent" TEXT,
  "location" TEXT,
  CONSTRAINT "file_embedding_pkey" PRIMARY KEY ("id", "tenant_id"),
  CONSTRAINT "file_embedding_file_id_fkey" FOREIGN KEY ("file_id", "tenant_id") REFERENCES "file" ("id", "tenant_id")
);

This is a tenant-aware table that stores the embeddings of the PDF documents. You can see that each row belongs to a specific tenant, and that the embedding column is of type vector(1024). Vector type is provided by the pg_vector extension for storing embeddings. By storing embeddings in a tenant-aware table, we can use Nile's built-in tenant isolation to ensure that information about PDFs won't leak between tenants.

3. Create metadata tables

We'll need few more tables to store information about the PDF documents, the conversations with them and the users. Go ahead and create these.

CREATE TABLE "file" (
  "id" UUID DEFAULT (gen_random_uuid()),
  "tenant_id" UUID NOT NULL,
  "url"      TEXT,
  "key"      TEXT,
  "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "updatedAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "user_id" UUID NOT NULL,
  "user_picture" TEXT,
  "user_name" TEXT,
  "isIndex" Boolean,
  "name" TEXT,
  "pageAmt" INTEGER,
  CONSTRAINT "file_pkey" PRIMARY KEY ("id", "tenant_id"),
  CONSTRAINT "unique_key_per_tenant" UNIQUE ("tenant_id", "key")
);

CREATE TABLE "message" (
  "id" UUID DEFAULT (gen_random_uuid()),
  "tenant_id" UUID NOT NULL,
  "text" TEXT,
  "isUserMessage" BOOLEAN,
  "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "updatedAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "user_id" UUID NOT NULL,
  "user_picture" TEXT,
  "user_name" TEXT,
  "fileId" UUID,
  CONSTRAINT "message_pkey" PRIMARY KEY ("id", "tenant_id"),
  CONSTRAINT "message_fileId_fkey" FOREIGN KEY ("fileId", "tenant_id") REFERENCES "file" ("id", "tenant_id")
);

CREATE TABLE "user_subscription" (
  "id" UUID DEFAULT (gen_random_uuid()),
  "user_id" UUID NOT NULL,
  "tenant_id" UUID NOT NULL,
  "stripe_customer_id" TEXT,
  "stripe_subscription_id" TEXT,
  "stripe_price_id" TEXT,
  "stripe_current_period_end" TIMESTAMP,
  CONSTRAINT "subscription_pkey" PRIMARY KEY ("id", "tenant_id"),
  CONSTRAINT "user_subscription_user_id_fkey" FOREIGN KEY ("user_id", "tenant_id") REFERENCES users.tenant_users ("user_id", "tenant_id"),
  CONSTRAINT "unique_stripe_customer_id" UNIQUE ("stripe_customer_id", "tenant_id"),
  CONSTRAINT "unique_stripe_subscription_id" UNIQUE ("stripe_subscription_id", "tenant_id")
);

If all went well, you'll see the new tables in the panel on the left hand side of the query editor. You can also see Nile's built-in tenant table next to it.

3. Getting credentials

In the left-hand menu, click on "Settings" and then select "Credentials". Generate credentials and keep them somewhere safe. These give you access to the database.

4. Setting up Google Authentication

This demo uses Google authentication for signup. You will need to configure this in both Google and Nile, following the instructions in Nile documentation.

5. Setting up 3rd Party SaaS

This example requires a few more 3rd party SaaS accounts. You'll need to set them up and grab API keys to configure this example:

6. Setting the environment

  • If you haven't cloned this project yet, now will be an excellent time to do so. Since it uses NextJS, we can use create-next-app for this:

    npx create-next-app -e https://github.com/niledatabase/niledatabase/tree/main/examples/ai/ai_pdf nile-ai-pdf
    cd nile-ai-pdf
    
  • Rename .env.example to .env.local, and update it with your workspace and database name. (Your workspace and database name are displayed in the header of the Nile dashboard.) Fill in the username and password with the credentials you picked up in the previous step. And fill in the access keys for UploadThing and OpenAI.

  • Install dependencies with yarn install or npm install.