Integrate LangChain with Astra DB Serverless

query_builder 15 min

LangChain can use Astra DB Serverless to store and retrieve vectors for ML applications.

Prerequisites

The code samples on this page assume the following:

You have an active Astra account.
You have created a Serverless (Vector) database.
You have created an application token with the Database Administrator role.
You have created an OpenAI API key.
You have installed Python 3.8+ and pip 23.0+.

You have installed the required dependencies:

pip install "langchain==0.1.7" "langchain-astradb>=0.0.1" \
    "langchain-openai==0.0.6" "datasets==2.17.1" "pypdf==4.0.2" \
    "python-dotenv==1.0.1"

Connect to the Serverless (Vector) database

Import libraries and connect to the database.

Local install
Google Colab

Create a .env file in the root of your program. Populate the file with the Astra token and endpoint values from the Database Details section of your database’s Overview tab, and your OpenAI API key.

.env

ASTRA_DB_APPLICATION_TOKEN="TOKEN"
ASTRA_DB_API_ENDPOINT="API_ENDPOINT"
ASTRA_DB_KEYSPACE="default_keyspace" # A namespace that exists in this database
OPENAI_API_KEY="API_KEY"

import os
from getpass import getpass
os.environ["ASTRA_DB_APPLICATION_TOKEN"] = getpass("ASTRA_DB_APPLICATION_TOKEN = ")
os.environ["ASTRA_DB_API_ENDPOINT"] = input("ASTRA_DB_API_ENDPOINT = ")
if _desired_namespace := input("ASTRA_DB_KEYSPACE (optional) = "):
    os.environ["ASTRA_DB_KEYSPACE"] = _desired_namespace
os.environ["OPENAI_API_KEY"] = getpass("OPENAI_API_KEY = ")

The endpoint format is https://ASTRA_DB_ID-ASTRA_DB_REGION.apps.astra.datastax.com.

Import your dependencies.

Local install
Google Colab

integrate.py

import os
from langchain_astradb import AstraDBVectorStore
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

from datasets import load_dataset
from dotenv import load_dotenv

from langchain_astradb import AstraDBVectorStore
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

from datasets import load_dataset

Load your environment variables.

Local install
Google Colab

load_dotenv()

ASTRA_DB_APPLICATION_TOKEN = os.environ.get("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_API_ENDPOINT = os.environ.get("ASTRA_DB_API_ENDPOINT")
ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

ASTRA_DB_APPLICATION_TOKEN = os.environ.get("ASTRA_DB_APPLICATION_TOKEN")
ASTRA_DB_API_ENDPOINT = os.environ.get("ASTRA_DB_API_ENDPOINT")
ASTRA_DB_KEYSPACE = os.environ.get("ASTRA_DB_KEYSPACE")
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

See Advanced configuration for Azure OpenAI values.

Don’t name the file langchain.py to avoid a namespace collision.

Create embeddings from text

Specify the embeddings model, database, and collection to use. If the collection does not exist, it is created automatically.

integrate.py

embedding = OpenAIEmbeddings()
vstore = AstraDBVectorStore(
    embedding=embedding,
    namespace=ASTRA_DB_KEYSPACE,
    collection_name="test",
    token=os.environ["ASTRA_DB_APPLICATION_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_API_ENDPOINT"],
)

Load a small dataset of philosophical quotes with the Python dataset module.

integrate.py

philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
print("An example entry:")
print(philo_dataset[16])

Process metadata and convert to LangChain documents.

integrate.py

docs = []
for entry in philo_dataset:
    metadata = {"author": entry["author"]}
    if entry["tags"]:
        # Add metadata tags to the metadata dictionary
        for tag in entry["tags"].split(";"):
            metadata[tag] = "y"
    # Add a LangChain document with the quote and metadata tags
    doc = Document(page_content=entry["quote"], metadata=metadata)
    docs.append(doc)

Compute embeddings for each document and store in the database.

integrate.py

inserted_ids = vstore.add_documents(docs)
print(f"\nInserted {len(inserted_ids)} documents.")

Verify integration

Show quotes that are similar to a specific quote.

integrate.py

results = vstore.similarity_search("Our life is what we make of it", k=3)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

Run the code

Run the code you defined earlier.

python integrate.py

Advanced configuration

If you’re using Azure OpenAI, include these additional environment variables:

OPENAI_API_TYPE="azure"
OPENAI_API_VERSION="2023-05-15"
OPENAI_API_BASE="https://RESOURCE_NAME.openai.azure.com"
OPENAI_API_KEY="API_KEY"

Next steps

Build a chatbot with LangChain auto_stories Tutorial

Integrate LangChain with Astra DB Serverless

Prerequisites

Connect to the Serverless (Vector) database

Create embeddings from text

Verify integration

Run the code

Advanced configuration

Next steps

Was this helpful?

Give Feedback