Data API introduction

The Data API is the foundational vector API for Astra DB Serverless databases.

Overview

The Data API allows you to create AI applications that interact with Astra DB Serverless databases, including commands that perform vector searches with AI projections that return similarity scores. Data API also provides a diverse range of query and update operators that enable you to filter documents and sort response data.

The clients for Python, TypeScript, and Java are custom abstractions based on the underlying functionality provided by the Data API.

In addition to using those language-specific clients, you can submit Data API commands directly via any of the following methods:

  • curl commands to send Data API requests to Astra DB Serverless databases, as detailed in Data API commands.

  • The Data API examples in Postman.

  • The Data API Swagger UI, which includes "Try It Out" functionality. In a browser, open the Swagger UI by specifying your database’s API Endpoint, using this format:

    https://<ASTRA_DB_API_ENDPOINT>/api/json/swagger-ui/

    When you create your Serverless (Vector) database in Astra Portal, the API Endpoint value is shown in the Database Details section.

In addition to the curl, Postman, and Swagger UI examples of Data API, see the API reference.

Prerequisites

The Data API examples assume the following:

  • You have an active Astra account.

  • You have created an Astra DB Serverless database in Astra Portal.

  • You have generated in application token in Astra Portal.

  • For your database, you copied the API Endpoint and auth token values from the Database Details section of Astra Portal. And you exported the values to environment variables in a CLI of your choosing:

    • ASTRA_DB_API_ENDPOINT

    • ASTRA_DB_APPLICATION_TOKEN

You can define a database namespace, which is also known as a keyspace, in the ASTRA_DB_KEYSPACE environment variable. Recommendation: specify the name that’s already set for every Astra DB Serverless database: default_keyspace.

To use the examples, also define a COLLECTION environment variable.

Data API naming conventions

Collection and property names must start and end with a letter or an underscore, and may only contain the following characters:

  • a-z

  • A-Z

  • 0-9

  • _ (underscore)

  • - (hyphen)

Names must be between 1 and 48 characters.

The _id field name is reserved and interpreted as a document’s identity field.

The dollar sign $ is reserved for system defined operator and field names. For example, $exists, $and, $or, and $vector.

Data API data types

Supported data types in Data API:

  • string

  • number

  • object (JSON object)

  • array

  • boolean

  • vector (via $vector)

  • date (via $date)

  • null

Data API limits

The Data API includes guardrails to ensure best practices, foster availability, and promote optimal configurations for your Astra DB Serverless databases.

Entity Limit Notes

Number of collections per database

5

Up to 5 collections in an Serverless (Vector) database.

Page size

20

A page may contain up to 20 documents. After that per-page maximum is reached, you can load any additional documents on the next page via the nextPageState generated ID found in a JSON API command’s response.

Sort page size

100

Document page size for sorting; implemented as separate from page size because sort operations need more rows per page.

Maximum property name

100

Maximum of 100 characters in a property name.

Maximum path length

250

Maximum of 250 characters in a path name; total for all segments, including any dots (.) between properties in a path.

String property maximum bytes

8000

Maximum of 8000 UTF-8 bytes for string length in a property.

Number property maximum characters

100

Maximum of 100 characters for number length in a property.

Maximum elements per array

1000

Maximum number of elements in an array. This limit applies to indexed properties only. This limit is ignored for non-indexed properties.

Maximum dimensions in vector-enabled collection

4096

Maximum size of dimensions you can define for a vector-enabled collection.

Maximum number of properties per JSON object

1000

Maximum number of properties for a JSON object. A given JSON object may have nested objects, also known as sub-documents. This maximum total count of 1000 refers to all the properties in the main document, plus a count of 1 for each sub-document (if any).

Maximum number of properties per JSON document

2000

Maximum number of properties (a/k/a fields) allowed in a single JSON document is 2000. This limit includes intermediate fields as well as leaf fields. For example, given this document:

{
  "root": {
    "branch": {
      "leaf": 42
    }
  }
}

For the purposes of the limit, the document has three fields: root, root.branch, and root.branch.leaf.

Maximum document size in characters

4 million

Maximum size of each document in a collection is 4 million characters.

Maximum number of documents deleted per transaction

20

Maximum number of documents that can be deleted in each transaction.

Maximum number of documents updated per transaction

20

Maximum number of documents that can be updated in each transaction.

Maximum number of documents inserted per transaction

20

Maximum number of documents that can be inserted in each transaction when using insertMany.

Maximum size _id values array via $in

100

Maximum size of an _id values array that can be sent via the $in operator.

Maximum number of documents returned with each vector search

1000

Maximum number of documents returned with each vector search.

If your code exceeds a limit, Data API still responds with an HTTP 200 OK status, but the returned JSON is different from the SUCCESS case. You should inspect the resulting JSON for any error messages. For example, if you exceed the per-transaction limit of 20 documents in an insertMany command, Data API responds with this message:

[{"message": "Request invalid, the field postCommand.command.documents not valid:
amount of documents to insert is over the max limit (21 vs 20)."}]

The SUCCESS response would contain a message such as:

({"status": {"insertedIds": [ ... ] } })

Data API operators

Data API provides a diverse range of logical and update operators that you can use in filters.

For examples in Data API request payloads, see Data API commands. Also see the Data API vector collection in Postman.

Operator type Name Purpose

Logical query

$and

Joins query clauses with a logical AND, returning the documents that match the conditions of both clauses.

$or

Joins query clauses with a logical OR, returning the documents that match the conditions of either clause.

$not

Returns documents that do not match the conditions of the filter clause.

Range query

$gt

Matches documents where the given property is greater than the specified value.

$gte

Matches documents where the given property is greater than or equal to the specified value.

$lt

Matches documents where the given property is less than the specified value.

$lte

Matches documents where the given property is less than or equal to the specified value.

Comparison query

$eq

Matches documents where the value of a property equals the specified value. This is the default when you do not specify an operator.

$ne

Matches documents where the value of a property does not equal the specified value.

$in

Matches any of the values specified in the array.

$nin

Matches any of the values that are NOT IN the array.

Element query

$exists

Matches documents that have the specified property.

Array query

$all

Matches arrays that contain all elements in the specified array.

$size

Selects documents where the array has the specified number of elements.

Field update

$currentDate

Used in an update operation. In the following example, the createdAt property is updated to use the current date:

{
  "findOneAndUpdate": {
    "filter" : {"_id" : "doc1"},
    "update" : {
      "$currentDate": {
        "createdAt": true
        }
      }
    }
}

$inc

Increments the value of the property by the specified amount.

$min

Updates the property only if the specified value is less than the existing property value.

$max

Updates the property only if the specified value is greater than the existing property value.

$rename

Renames the specified property in each matching document.

$set

Sets the value of a property in each matching document.

$unset

Removes the specified property from each matching document.

Array update

$addToSet

Adds elements to the array only if they do not already exist in the set.

$pop

Removes the first or last item of the array, depending on the value of the operator (-1 to remove the first item; 1 to remove the last item).

$push

Adds or appends data to the end of the property value. Or, if the value is not yet an array: * If the property has no value, creates a one-element array (containing the item given). * If the property has a non-array value, creates a two-element array, with the old value as the first entry, and the specified item as the second entry.

$each

An array update that modifies the $push and $addToSet operators to append multiple items for array updates.

$position

An array update that modifies the $push operator to specify the position in the array to add elements.

What’s next?

See the next topic for details about the Data API commands and submitting them via curl.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com