OpenAI Assistants with persistent vector store

query_builder 20 min

The Astra Assistants API empowers OpenAI’s Assistants API to use a Serverless (Vector) database as a persistent vector store. The integration supports the following features:

Store messages, assistants, threads, runs, and files in a database.
Stream messages to a database.

The database stores and queries embeddings for retrieval augmented generation (RAG). For large language model (LLM) tasks, such as embedding generation and chat completion, the database calls OpenAI or other LLMs.

Users interact with the service through the OpenAI SDKs. Store your proprietary data and run assistant API examples on your own Astra DB Serverless database, which can be managed, accessed, and secured.

Prerequisites

To complete this tutorial, you’ll need the following:

An active Astra account
A paid OpenAI account
Python 3.7+
git
An application token with the Database Administrator role

You should also be proficient in the following tasks:

Interacting with databases.
Running a basic Python script.

Run an Assistant API example

Clone the Astra Assistants API repository and switch to that directory.

git clone https://github.com/datastax/astra-assistants-api.git \
  && cd astra-assistants-api

Create a .env file with the environment variables for your selected model.

OpenAI
Perplexity
Cohere
Bedrock
Vertex

.env

#!/bin/bash

# Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
export ASTRA_DB_APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

.env

#!/bin/bash

# Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
export ASTRA_DB_APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Go to https://www.perplexity.ai/settings/api to generate a secret key.
export PERPLEXITYAI_API_KEY=

.env

#!/bin/bash

# Go to https://astra.datastax.com > "Tokens" to generate an Administrator User token.
export ASTRA_DB_APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Go to https://dashboard.cohere.com/api-keys to create an API key.
export COHERE_API_KEY=

.env

#!/bin/bash

# Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
export ASTRA_DB_APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Bedrock models: https://docs.aws.amazon.com/bedrock/latest/userguide/setting-up.html
export AWS_REGION_NAME=
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=

.env

#!/bin/bash

# Go to https://astra.datastax.com > Tokens to generate an Administrator User token.
export ASTRA_DB_APPLICATION_TOKEN=
# Go to https://platform.openai.com/api-keys to create a secret key.
export OPENAI_API_KEY=

# Vertex AI models: https://console.cloud.google.com/vertex-ai
export GOOGLE_JSON_PATH=
export GOOGLE_PROJECT_ID=

Install poetry.
```
pip install poetry
```
Install the dependencies for your assistant.
```
poetry install
```

Run an API example.

Chat completion
Retrieval
Streaming retrieval
Function calling

poetry run python examples/python/chat_completion/basic.py

poetry run python examples/python/retrieval/basic.py

poetry run python examples/python/streaming_retrieval/basic.py

poetry run python examples/python/function_calling/basic.py

Build the Assistants API-powered application

Install the [streaming-assistants](https://github.com/phact/streaming-assistants) dependency with your package manager:
```
poetry add streaming_assistants
```
Import and patch your client:
```
from openai import OpenAI
from streaming_assistants import patch
client = patch(OpenAI())
```
Using your token, the system creates an Astra DB Serverless database named assistant_api_db.

The first request might take a few minutes while your database is created. This delay happens only once.

Create your assistant.

assistant = client.beta.assistants.create(
  instructions="You are a personal math tutor. When asked a math question, write and run code to answer the question.",
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}]
)

By default, the service uses Astra DB Serverless as the vector store and OpenAI for embeddings and chat completion.

Third-party LLM support

DataStax supports many third-party models for embeddings and completion with litellm. Pass the API key of your service using api-key and embedding-model headers.

You can pass different models with the corresponding API key in your environment:

model="gpt-4-1106-preview"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="gpt-3.5-turbo"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="cohere/command"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="perplexity/mixtral-8x7b-instruct"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="perplexity/pplx-70b-online"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="anthropic.claude-v2"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="gemini/gemini-pro"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

model="meta.llama2-13b-chat-v1"

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
    model=model,
)

For third-party embedding models, DataStax supports the embedding_model in client.files.create:

file = client.files.create(
    file=open(
        "./test/language_models_are_unsupervised_multitask_learners.pdf",
        "rb",
    ),
    purpose="assistants",
    embedding_model="text-embedding-3-large",
)

By default, the API uses your Astra DB Serverless database as the vector store and OpenAI for the embeddings and chat completion.

OpenAI Assistants with persistent vector store

Prerequisites

Run an Assistant API example

Build the Assistants API-powered application

Third-party LLM support

See also

Was this helpful?

Give Feedback