Collections reference

Collections are used to store documents in Astra DB Serverless.

Use the Database class to manage collections, and the Collection class itself to work with the data in them.

Currently, you can create up to five collections in an Astra DB Serverless database.

The examples in this topic assume you have already created an Astra DB Serverless database. The Data API supports vector-enabled Astra DB Serverless databases. If you haven’t already, sign into Astra Portal to create a vector-enabled Astra DB Serverless database. See Create a database.

In addition, the client app examples (Python, TypeScript, Java) in this topic and subsequent API Reference topics assume you have already:

Instantiated a DataAPIClient object. See Start with the DataAPIClient.
Connected to a database. See Databases reference.

Create a collection

Create a new collection in an Astra DB Serverless database.

Python
TypeScript
Java
cURL

View this topic in more detail on the API Reference.

collection = database.create_collection("collection")

Create a new collection to store vector data.

collection = database.create_collection(
    "vector_collection",
    dimension=5,
    metric="cosine",
)

Parameters:

Name Type Summary

Name	Type	Summary
name	`str`	The name of the collection.
namespace	`Optional[str]`	The namespace where the collection is to be created. If not specified, the database’s working namespace is used.
dimension	`Optional[int]`	For vector collections, the dimension of the vectors; that is, the number of their components. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.
metric	`Optional[str]`	The similarity metric used for vector searches. Allowed values are `VectorMetric.DOT_PRODUCT`, `VectorMetric.EUCLIDEAN` or `VectorMetric.COSINE` (default).
indexing	`Optional[Dict[str, Any]]`	Optional specification of the indexing options for the collection, in the form of a dictionary such as `{"deny": […]}` or `{"allow": […]}`.
default_id_type	`Optional[str]`	This sets what type of IDs the API server will generate when inserting documents that do not specify their `_id` field explicitly. Can be set to any of the values `DefaultIdType.UUID`, `DefaultIdType.OBJECTID`, `DefaultIdType.UUIDV6`, `DefaultIdType.UUIDV7`, `DefaultIdType.DEFAULT`.
additional_options	`Optional[Dict[str, Any]]`	Any further set of key-value pairs that will be added to the "options" part of the payload when sending the Data API command to create a collection.
check_exists	`Optional[bool]`	Whether to run an existence check for the collection name before attempting to create the collection: If `check_exists` is True, an error is raised when creating an existing collection. If it is False, the creation is attempted. In this case, for preexisting collections, the command will succeed or fail depending on whether the options match or not.
max_time_ms	`Optional[int]`	A timeout, in milliseconds, for the underlying HTTP request.

name

str

The name of the collection.

namespace

Optional[str]

The namespace where the collection is to be created. If not specified, the database’s working namespace is used.

dimension

Optional[int]

For vector collections, the dimension of the vectors; that is, the number of their components. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.

metric

Optional[str]

The similarity metric used for vector searches. Allowed values are VectorMetric.DOT_PRODUCT, VectorMetric.EUCLIDEAN or VectorMetric.COSINE (default).

indexing

Optional[Dict[str, Any]]

Optional specification of the indexing options for the collection, in the form of a dictionary such as {"deny": […]} or {"allow": […]}.

default_id_type

Optional[str]

This sets what type of IDs the API server will generate when inserting documents that do not specify their _id field explicitly. Can be set to any of the values DefaultIdType.UUID, DefaultIdType.OBJECTID, DefaultIdType.UUIDV6, DefaultIdType.UUIDV7, DefaultIdType.DEFAULT.

additional_options

Optional[Dict[str, Any]]

Any further set of key-value pairs that will be added to the "options" part of the payload when sending the Data API command to create a collection.

check_exists

Optional[bool]

Whether to run an existence check for the collection name before attempting to create the collection: If check_exists is True, an error is raised when creating an existing collection. If it is False, the creation is attempted. In this case, for preexisting collections, the command will succeed or fail depending on whether the options match or not.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Returns:

Collection - The created collection object, ready to be used to work with the documents in it.

Example response

Collection(name="collection", namespace="default_keyspace", database=Database(api_endpoint="https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com", token="AstraCS:aAbB...", namespace="default_keyspace"))

Example:

from astrapy import DataAPIClient
import astrapy
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")

# Create a non-vector collection
collection_simple = database.create_collection("collection")

# Create a vector collection
collection_vector = database.create_collection(
    "vector_collection",
    dimension=3,
    metric=astrapy.constants.VectorMetric.COSINE,
)

# Create a collection with UUIDv6 as default IDs
from astrapy.constants import DefaultIdType, SortDocuments

collection_uuid6 = database.create_collection(
    "uuid6_collection",
    default_id_type=DefaultIdType.UUIDV6,
)

collection_uuid6.insert_one({"desc": "a document", "seq": 0})
collection_uuid6.insert_one({"_id": 123, "desc": "another", "seq": 1})
doc_ids = [
    doc["_id"]
    for doc in collection_uuid6.find({}, sort={"seq": SortDocuments.ASCENDING})
]
print(doc_ids)
#  Will print: [UUID('1eef29eb-d587-6779-adef-45b95ef13497'), 123]
print(doc_ids[0].version)
#  Will print: 6

View this topic in more detail on the API Reference.

const collection = await db.createCollection('COLLECTION');

Create a new collection to store vector data.

const collection = await db.createCollection<Schema>('COLLECTION', {
  vector: {
    dimension: 5,
    metric: 'cosine',
  },
  checkExists: false,
});

A Collection is typed as Collection<Schema> where Schema is the type of the documents in the collection. Operations on the collection will be strongly typed if a specific schema is provided, otherwise remained largely weakly typed if no type is provided, which may be preferred for dynamic data access & operations. It’s up to the user to ensure that the provided type truly represents the documents in the collection.

Parameters:

Name Type Summary

Name	Type	Summary
collectionName	`string`	The name of the collection to create.
options?	`CreateCollectionOptions<Schema>`	The options for creating the collection.

collectionName

string

The name of the collection to create.

options?

CreateCollectionOptions<Schema>

The options for creating the collection.

Options (CreateCollectionOptions):

Name Type Summary

Name	Type	Summary
vector?	`VectorOptions`	The vector configuration for the collection, e.g. vector dimension & similarity metric. If not set, collection will not support vector search. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.
indexing?	`IndexingOptions<Schema>`	The indexing configuration for the collection.
defaultId?	`DefaultIdOptions`	The defaultId configuration for the collection, for when a document does not specify an `_id` field.
namespace?	`string`	Overrides the namespace where the collection is created. If not set, the database’s working namespace is used.
checkExists?	`boolean`	Whether to run an existence check for the collection name before attempting to create the collection. If it is `true` or unset, an error is raised when creating an existing collection. Else, if it’s `false`, the creation is attempted. In this case, for preexisting collections, the command will succeed or fail depending on whether the options match or not.
maxTimeMs?	`number`	Maximum time in milliseconds the client should wait for the operation to complete.

vector?

VectorOptions

The vector configuration for the collection, e.g. vector dimension & similarity metric. If not set, collection will not support vector search. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.

indexing?

IndexingOptions<Schema>

The indexing configuration for the collection.

defaultId?

DefaultIdOptions

The defaultId configuration for the collection, for when a document does not specify an _id field.

namespace?

string

Overrides the namespace where the collection is created. If not set, the database’s working namespace is used.

checkExists?

boolean

Whether to run an existence check for the collection name before attempting to create the collection.

If it is true or unset, an error is raised when creating an existing collection.

Else, if it’s false, the creation is attempted. In this case, for preexisting collections, the command will succeed or fail depending on whether the options match or not.

maxTimeMs?

number

Maximum time in milliseconds the client should wait for the operation to complete.

Returns:

Promise<Collection<Schema>> - A promise that resolves to the created collection object.

Example:

import { DataAPIClient, VectorDoc } from '@datastax/astra-db-ts';

// Get a new Db instance
const db = new DataAPIClient('TOKEN').db('API_ENDPOINT');

// Define the schema for the collection
interface User extends VectorDoc {
  name: string,
  age?: number,
}

(async function () {
  // Create a basic untyped non-vector collection
  const users1 = await db.createCollection('users');
  await users1.insertOne({ name: 'John' });

  // Typed collection with custom options in a non-default namespace
  const users2 = await db.createCollection<User>('users', {
    namespace: 'NAMESPACE',
    defaultId: {
      type: 'objectId',
    },
    vector: {
      dimension: 5,
      metric: 'cosine',
    },
  });
  await users2.insertOne({ name: 'John' }, { vector: [.12, .62, .87, .16, .72] });
})();

Name	Type	Summary
`collectionName`	`String`	The name of the collection.
`dimension`	`int`	The dimension for the vector in the collection. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.
`metric`	`SimilarityMetric`	The similarity metric to use for vector search: `SimilarityMetric.cosine` (default), `SimilarityMetric.dot_product`, or `SimilarityMetric.euclidean`.
`collectionOptions`	`CollectionOptions`	Fine Grained settings with vector, indexing and `defaultId` options
`clazz`	`Class<T>`	Working with specialized beans for the collection and not the default `Document` type.

Name	Type	Summary
`createCollection`	command	The Data API command that specifies a new collection is to be created. It acts as a container for all the attributes and settings required to create the new collection.
`name`	string	The name of the new collection. A string value that uniquely identifies the collection within the database.
`options`	Optional[string]	Options for the collection, such as configuration for vector search. Required to create a vector-enabled collection.
`defaultId`	Optional[string]	Controls how the Data API will allocate a new `_id` for each document that does not specify a value in the request. For backwards compatibility with Data API releases before version 1.0.3, if you omit a `defaultId` option on `createCollection`, a document’s `_id` value is a plain String version of random-based UUID (version 4).
`type`	String	Required if `defaultId` option is used. Specifies one of `objectId`, `uuidv7`, `uuidv6`, `uuid`. Cannot be changed after the collection is created.
`dimension`	Optional[int]	The dimension for vector search in the collection. If you’re not sure what dimension to set, use whatever dimension vector your embeddings model produces.
`metric`	Optional[string]	The similarity metric to use for vector search: `cosine` (default), `dot_product`, or `euclidean`.
`indexing`	Optional[string]	Determine which properties are indexed during subsequent update operations. If indexing is specified on `createCollection`, you must further specify `allow` or `deny` clauses, but not both. They are mutually exclusive.
`allow`	[array]	The `allow` or `deny` is required if `indexing` is specified. An array of one or more properties that are indexed. Or you can enter a wildcard `"allow": ["*""]` indicating that all properties will be indexed during an update operation (functionally the same as the default if `indexing` clause is not present.
`deny`	[array]	The `allow` or `deny` is required if `indexing` is specified. An array of one or more properties that are not indexed. Or you can enter a wildcard `"deny": ["*""]` indicating that no properties will be indexed during an update operation.

The defaultId option

The Data API defaultId option controls how the Data API will allocate a new _id for each document that does not specify a value in the request.

For backwards compatibility with Data API releases before version 1.0.3, if you omit a defaultId option on createCollection, a document’s _id value is a plain String version of random-based UUID (version 4).

Once the collection has been created, you cannot change the defaultId option (if entered).

If you include a defaultId option with createCollection, you must set the type. The capitalization is case-sensitive. Specify one of the following:

Type Meaning

Type	Meaning
`objectId`	Each document’s generated `_id` will be an `objectId`.
`uuidv6`	Each document’s generated `_id` will be a Version 6 UUID, which is field compatible with a Version 1 time uuid, but with the ability to be lexicographically sortable.
`uuidv7`	Each document’s `_id` will be a Version 7 UUID, which is designed to be a replacement for Version 1 time uuid, and is recommended for use in new systems.
`uuid`	Each document’s generated `_id` will be a Version 4 Random UUID. This type is analogous to the uuid type and functions in Apache Cassandra®.

objectId

Each document’s generated _id will be an objectId.

uuidv6

Each document’s generated _id will be a Version 6 UUID, which is field compatible with a Version 1 time uuid, but with the ability to be lexicographically sortable.

uuidv7

Each document’s _id will be a Version 7 UUID, which is designed to be a replacement for Version 1 time uuid, and is recommended for use in new systems.

uuid

Each document’s generated _id will be a Version 4 Random UUID. This type is analogous to the uuid type and functions in Apache Cassandra®.

Example:

{
    "createCollection": {
        "name": "vector_collection2",
        "options": {
            "defaultId": {
                "type": "objectId"
            },
            "vector": {
                "dimension": 1024,
                "metric": "cosine"
            }
        }
    }
}

When you add documents to your collection, using Data API commands such as insertOne and insertMany, you would not specify an explicitly numbered _id value (such as "_id": "12") in the request. The server allocates a unique value per document based on the type you indicated in the createCollection command’s defaultId option.

Client apps can detect the use of $objectId or $uuid in the response document and return to the caller the objects that represent the types natively. In this way, client apps can use generated IDs in the methods that are based on Data API operations such as findOneAndUpdate, updateOne, updateMany.

For example, in Python, the client can specify the detected value for a document’s $objectId or $uuid:

# API Response with $objectId
{
"_id": {"$objectId": "57f00cf47958af95dca29c0c"}
    "summary": "Retrieval-Augmented Generation is the process of optimizing the output of a large language model..."
}

# Client returns Dict from collection.find_one()
my_doc = {
    "_id": astrapy.ObjectId("57f00cf47958af95dca29c0c"),
    "summary": "Retrieval-Augmented Generation is the process of optimizing the output of a large language model..."
}

# API Response with $uuid
{
"_id": {"$uuid": "ffd1196e-d770-11ee-bc0e-4ec105f276b8"}
    "summary": "Retrieval-Augmented Generation is the process of optimizing the output of a large language model..."
}

# Client returns Dict from collection.find_one()
my_doc = {
    "_id": UUID("ffd1196e-d770-11ee-bc0e-4ec105f276b8"),
    "summary": "Retrieval-Augmented Generation is the process of optimizing the output of a large language model..."
}

There are many advantages when using generated _id values with documents, versus relying on manually numbered _id values. For example, with generated _id values of type uuidv7:

Uniqueness across the database: A generated _id value is designed to be globally unique across the entire database. This uniqueness is achieved through a combination of timestamp, machine identifier, process identifier, and a sequence number. Explicitly numbering documents might lead to clashes unless carefully managed, especially in distributed systems.
Automatic generation: The _id values are automatically generated by Astra DB Serverless. This means you won’t have to worry about creating and maintaining a unique ID system, reducing the complexity of the code and the risk of errors.
Timestamp information: A generated _id value includes a timestamp as its first component, representing the document’s creation time. This can be useful for tracking when a document was created without needing an additional field. In particular, type uuidv7 values provide a high degree of granularity (milliseconds) in timestamps.
Avoids manual sequence management: Managing sequential numeric IDs manually can be challenging, especially in environments with high concurrency or distributed systems. There’s a risk of ID collision or the need to lock tables or sequences to generate a new ID, which can affect performance. Generated _id values are designed to handle these issues automatically.

While numeric _id values might be simpler and more human-readable, the benefits of using generated _id values make it a superior choice for most applications, especially those that have many documents.

The indexing option

The Data API createCollection command includes an optional indexing clause.

If you omit the indexing option, by default all properties in the document are indexed when it is added or modified in the database. The index is implemented as a Storage-Attached Index (SAI), which enables Data API queries that filter and/or sort data based on the indexed property.

If you specify the indexing option when you create a collection, you must include one (but not both) of the following: an allow or a deny array.

Pros and cons of selective indexing

It’s important to emphasize the pros and cons of allowing only certain properties to be indexed. While you may want to skip indexing certain properties to increase write-time performance, you’ll need to think ahead — when you create the collection — about which properties will be important to use in subsequent queries that rely on filtering and/or sort operations. You can only filter and/or sort the properties that have been indexed. Data API returns an error if you attempt to filter or sort a non-indexed property.

The error would have one of these formats:

UNINDEXED_FILTER_PATH("Unindexed filter path"), ...

UNINDEXED_SORT_PATH("Unindexed sort path"), ...

ID_NOT_INDEXED("_id is not indexed"), ...

Example:

UNINDEXED_FILTER_PATH("Unindexed filter path: The filter path ('address.city') is not indexed)"

While weighing the pros and cons of indexed or non-indexed properties in a document, consider the maximum size limits for those properties. Non-indexed properties allow for a much larger quantity of data, to accommodate data such as a blog post’s String content. In comparison, indexed properties are appropriately bound by lower maximum size limits to ensure efficient and performant read operations via the SAI index.

You’ll want to evaluate the pros and cons for each property in a document, and make decisions with the createCollection command’s indexing clause (if specified), based on the read/write and data capacity requirements of your apps.

Of course, test your app’s performance with the database including average and peak loads. If you need to adjust indexing options, try different settings in a newly defined collection and run tests again.

Indexing allow example

cURL example:

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
    "createCollection": {
        "name": "vector_collection",
        "options": {
            "vector": {
                "dimension": 5,
                "metric": "cosine"
            },
            "indexing": {
                "allow": [
                    "property1",
                    "property2"
                ]
            }
        }
    }
}' | json_pp

In the preceding allow example, only the values of property1 and property2 are included in the SAI index. No other properties are indexed.

The net result for subsequent update operations:

Property name

Indexed?

property1

Yes

property2

Yes

property3

property3.prop3a

property3.prop3b

property4

property5

property5.prop5a

property5.prop5b

property5.prop5c

As a result, subsequent Data API queries may perform filtering and/or sort operations based only on property1, property2, or both.

Indexing deny example

Now let’s take an inverse approach with an indexing … deny array example in cURL:

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
    "createCollection": {
        "name": "vector_collection",
        "options": {
            "vector": {
                "dimension": 5,
                "metric": "cosine"
            },
            "indexing": {
                "deny": [
                    "property1",
                    "property3",
                    "property5.prop5b"
                ]
            }
        }
    }
}' | json_pp

In the preceding example, all the properties in the document are indexed except the ones listed in the deny clause.

Notice how the parent property3 was specified, which means its sub-properties property3.prop3a and property3.prop3b are also not indexed.

However, also notice how the specific sub-property named property5.prop5b was listed on the deny clause; which means property5.prop5b is not indexed, but the parent property5 and the sub-properties property5.prop5a and property5.prop5c are included in the SAI index.

The net result for subsequent update operations:

Property name

Indexed?

property1

property2

Yes

property3

property3.prop3a

property3.prop3b

property4

Yes

property5

Yes

property5.prop5a

Yes

property5.prop5b

property5.prop5c

Yes

Indexing wildcard examples

The createCollection command’s optional indexing clause provides a convenience wildcard ["*"] in its syntax. For example, in cURL, the following clause means that all properties will be indexed:

{
  "indexing": {
    "allow": ["*"]
  }
}

The preceding example is the equivalent of omitting the indexing clause. Meaning, all properties in the document will be indexed during update operations.

You can use the wildcard character with the deny clause:

{
  "indexing": {
    "deny": ["*"]
  }
}

In the preceding example, no properties are indexed, not even $vector.

List all collections

Retrieve an iterable object over collections. Unless otherwise specified, this implementation refers to the collections in the working namespace of the database.

Python
TypeScript
Java
cURL
CLI

View this topic in more detail on the API Reference.

collection_iterable = database.list_collections()

Parameters:

Name Type Summary

Name	Type	Summary
namespace	`Optional[str]`	the namespace to be inspected. If not specified, the database’s working namespace is used.
max_time_ms	`Optional[int]`	A timeout, in milliseconds, for the underlying HTTP request.

namespace

Optional[str]

the namespace to be inspected. If not specified, the database’s working namespace is used.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Returns:

CommandCursor[CollectionDescriptor] - An iterable over CollectionDescriptor objects.

Example response

# (output below reformatted with indentation for clarity)
# (a single example collection descriptor from the cursor is shown)
[
    ...,
    CollectionDescriptor(
        name='my_collection',
        options=CollectionOptions(
            vector=CollectionVectorOptions(
                dimension=3,
                metric='dot_product'
            ),
            indexing={'allow': ['field']}
        )
    ),
    ...
]

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")

coll_cursor = database.list_collections()
coll_cursor  # this looks like: Cursor("a", new, retrieved: 0)
list(coll_cursor)  # [{'name': 'my_v_col'}, ...]
for coll_desc in database.list_collections():
    print(coll_desc)
# will print:
#   CollectionDescriptor(name='my_v_col', options=CollectionOptions(vector=CollectionVectorOptions(dimension=3, metric='dot_product')))
#   ...

View this topic in more detail on the API Reference.

const collections = await db.listCollections();

Parameters:

Name Type Summary

Name	Type	Summary
options	`ListCollectionsOptions`	Options regarding listing collections.

options

ListCollectionsOptions

Options regarding listing collections.

Options (ListCollectionsOptions):

Name Type Summary

Name	Type	Summary
nameOnly?	`false`	If true, only the name of the collection is returned. Else, the full information for each collection is returned. Defaults to true.
namespace?	`string`	The namespace to be inspected. If not specified, the database’s working namespace is used.
maxTimeMs?	`number`	Maximum time in milliseconds the client should wait for the operation to complete.

nameOnly?

false

If true, only the name of the collection is returned. Else, the full information for each collection is returned. Defaults to true.

namespace?

string

The namespace to be inspected. If not specified, the database’s working namespace is used.

maxTimeMs?

number

Maximum time in milliseconds the client should wait for the operation to complete.

Returns:

Promise<FullCollectionInfo[]> - A promise that resolves to an array of full collection information objects.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get a new Db instance
const db = new DataAPIClient('TOKEN').db('API_ENDPOINT');

(async function () {
  // Gets full info about all collections in db
  const collections = await db.listCollections();

  for (const collection of collections) {
    console.log(`Collection '${collection.name}' has default ID type '${collection.options.defaultId?.type}'`);
  }
})();

To access the Javadoc on those methods consult the Database Javadoc.

// Given db Database object, list all collections
Stream<CollectionInfo> collection = listCollections();

Returned Value:

Type Description

Type	Description
`Stream<CollectionInfo>`	The definition elements of collections.

Stream<CollectionInfo>

The definition elements of collections.

Example:

package com.datastax.astra.client.database;

import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.CollectionInfo;

import java.util.stream.Stream;

public class ListCollections {
    public static void main(String[] args) {
        Database db = new Database("TOKEN", "API_ENDPOINT");

        // Get collection Names
        Stream<String> collectionNames = db.listCollectionNames();

        // Get Collection information (with options)
        Stream<CollectionInfo> collections = db.listCollections();
        collections.map(CollectionInfo::getOptions).forEach(System.out::println);
    }
}

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "findCollections": {
    "options" : {
      "explain" : true
    }
  }
}' | json_pp

Parameters:

Name Type Summary

Name	Type	Summary
findCollections	command	The Data API command to find all collections in the database. It acts as a container for all the attributes and settings required to find collections.
options	string	Under this key, an additional setting for `findCollections` may be specified.
explain	boolean	When set to `true`, indicates that the command should not just retrieve the names of collections, but also provide a brief explanation of metadata associated with each collection. Such as whether the collection was created with the vector option. And for each vector-enabled collection, to further specify its dimension and metric values, and any indexing option.

findCollections

command

The Data API command to find all collections in the database. It acts as a container for all the attributes and settings required to find collections.

options

string

Under this key, an additional setting for findCollections may be specified.

explain

boolean

When set to true, indicates that the command should not just retrieve the names of collections, but also provide a brief explanation of metadata associated with each collection. Such as whether the collection was created with the vector option. And for each vector-enabled collection, to further specify its dimension and metric values, and any indexing option.

Response

{
    "status": {
        "collections": [
            {
                "name": "vector_collection",
                "options": {
                    "defaultId": {
                        "type": "objectId"
                    },
                    "vector": {
                        "dimension": 5,
                        "metric": "cosine"
                    },
                    "indexing": {
                        "allow": [
                            "*"
                        ]
                    }
                }
            }
        ]
    }
}

To list all collections in a database, use the following command:

astra db list-collections <db_name>

Parameters:

Name Type Summary

Name	Type	Summary
db_name	`String`	The name of the database

db_name

String

The name of the database

Example output:

+---------------------+-----------+-------------+
| Name                | Dimension | Metric      |
+---------------------+-----------+-------------+
| collection_simple   |           |             |
| collection_vector   | 14        | cosine      |
| msp                 | 1536      | dot_product |
+---------------------+-----------+-------------+

List collection names

Get the names of the collections as a list of strings. Unless otherwise specified, this refers to the collections in the namespace the database is set to use.

Python
TypeScript
Java
cURL
CLI

View this topic in more detail on the API Reference.

database.list_collection_names()

Get the names of the collections in a specified namespace of the database.

database.list_collection_names(namespace="that_other_namespace")

Parameters:

Name Type Summary

Name	Type	Summary
namespace	`Optional[str]`	the namespace to be inspected. If not specified, the database’s working namespace is used.
max_time_ms	`Optional[int]`	A timeout, in milliseconds, for the underlying HTTP request.

namespace

Optional[str]

the namespace to be inspected. If not specified, the database’s working namespace is used.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Returns:

List[str] - A list of the collection names, in no particular order.

Example response

['a_collection', 'another_col']

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")

database.list_collection_names()
# ['a_collection', 'another_col']

View this topic in more detail on the API Reference.

const collectionNames = await db.listCollections({ nameOnly: true });

Get the names of the collections in a specified namespace of the database.

const collectionNames = await db.listCollections({ nameOnly: true, namespace: 'NAMESPACE' });

Parameters:

Name Type Summary

Name	Type	Summary
options	`ListCollectionsOptions`	Options regarding listing collections.

options

ListCollectionsOptions

Options regarding listing collections.

Options (ListCollectionsOptions):

Name Type Summary

Name	Type	Summary
nameOnly	`true`	If true, only the name of the collection is returned. Else, the full information for each collection is returned. Defaults to true.
namespace?	`string`	The namespace to be inspected. If not specified, the database’s working namespace is used.
maxTimeMs?	`number`	Maximum time in milliseconds the client should wait for the operation to complete.

nameOnly

true

If true, only the name of the collection is returned. Else, the full information for each collection is returned. Defaults to true.

namespace?

string

The namespace to be inspected. If not specified, the database’s working namespace is used.

maxTimeMs?

number

Maximum time in milliseconds the client should wait for the operation to complete.

Returns:

Promise<string[]> - A promise that resolves to an array of the collection names.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get a new Db instance
const db = new DataAPIClient('TOKEN').db('API_ENDPOINT');

(async function () {
  // Gets just names of all collections in db
  const collections = await db.listCollections({ nameOnly: true });

  for (const collectionName of collections) {
    console.log(`Collection '${collectionName}' exists`);
  }
})();

To access the Javadoc on those methods consult the Database Javadoc.

// Given db Database object, list all collections
Stream<String> collection = listCollectionsNames();

Returned Value:

Type Description

Type	Description
`Stream<String>`	The names of the collections.

Stream<String>

The names of the collections.

Example:

package com.datastax.astra.client.database;

import com.datastax.astra.client.Database;
import com.datastax.astra.client.model.CollectionInfo;

import java.util.stream.Stream;

public class ListCollections {
    public static void main(String[] args) {
        Database db = new Database("TOKEN", "API_ENDPOINT");

        // Get collection Names
        Stream<String> collectionNames = db.listCollectionNames();

        // Get Collection information (with options)
        Stream<CollectionInfo> collections = db.listCollections();
        collections.map(CollectionInfo::getOptions).forEach(System.out::println);
    }
}

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "findCollections": {
    "options" : {
      "explain" : true
    }
  }
}' | json_pp

Parameters:

Name Type Summary

Name	Type	Summary
findCollections	command	The Data API command to find all collections in the database. It acts as a container for all the attributes and settings required to find collections.
options	string	Under this key, an additional setting for `findCollections` may be specified.
explain	boolean	When set to `true`, indicates that the command should not just retrieve the names of collections, but also provide a brief explanation of metadata associated with each collection. Such as whether the collection was created with the vector option. And for each vector-enabled collection, to further specify its dimension and metric values, and any indexing option.

findCollections

command

The Data API command to find all collections in the database. It acts as a container for all the attributes and settings required to find collections.

options

string

Under this key, an additional setting for findCollections may be specified.

explain

boolean

Response

{
    "status": {
        "collections": [
            {
                "name": "vector_collection",
                "options": {
                    "defaultId": {
                        "type": "objectId"
                    },
                    "vector": {
                        "dimension": 5,
                        "metric": "cosine"
                    },
                    "indexing": {
                        "allow": [
                            "*"
                        ]
                    }
                }
            }
        ]
    }
}

To list all collections in a database, use the following command:

astra db list-collections <db_name> | cut -b 1-23

Parameters:

Name Type Summary

Name	Type	Summary
db_name	`String`	The name of the database

db_name

String

The name of the database

Example output:

+---------------------+
| Name                |
+---------------------+
| collection_simple   |
| collection_vector   |
| msp                 |
+---------------------+

Get a collection

Get a reference to an existing collection.

Python
TypeScript
Java

View this topic in more detail on the API Reference.

collection = database.get_collection("vector_collection")

The example above is equivalent to these two alternate notations:

collection1 = database["vector_collection"]
collection2 = database.vector_collection

The get_collection method will return a Collection object even for collections that don’t exist, so make sure the collection exists first. Your responsibility is to know which collections exist, because the get_collection method does not check for you.

Parameters:

Name Type Summary

Name	Type	Summary
name	`str`	The name of the collection.
namespace	`Optional[str]`	The namespace containing the collection. If no namespace is specified, the general setting for this database is used.

name

str

The name of the collection.

namespace

Optional[str]

The namespace containing the collection. If no namespace is specified, the general setting for this database is used.

Returns:

Collection - An instance of the Collection class corresponding to the specified collection name.

Example response

Collection(name="vector_collection", namespace="default_keyspace", database=Database(api_endpoint="https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com", token="AstraCS:aAbB...", namespace="default_keyspace"))

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")

collection = database.get_collection("my_collection")
collection.count_documents({}, upper_bound=100)  # will print e.g.: 41

View this topic in more detail on the API Reference.

const collection = db.getCollection('COLLECTION');

The getCollection method will return a Collection object even for collections that don’t exist, so make sure the collection exists first. Your responsibility is to know which collections exist, because the getCollection method does not check for you.

Parameters:

Name Type Summary

Name	Type	Summary
collectionName	`string`	The name of the collection to create.
options?	`WithNamespace`	Allows you to override which namespace to use for the collection.

collectionName

string

The name of the collection to create.

options?

WithNamespace

Allows you to override which namespace to use for the collection.

Returns:

Collection<Schema> - An unverified reference to the collection.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get a new Db instance
const db = new DataAPIClient('TOKEN').db('API_ENDPOINT');

// Define the schema for the collection
interface User {
  name: string,
  age?: number,
}

(async function () {
  // Basic untyped collection
  const users1 = db.collection('users');
  await users1.insertOne({ name: 'John' });

  // Typed collection from different namespace
  const users2 = db.collection<User>('users', {
    namespace: 'NAMESPACE',
  });
  await users2.insertOne({ name: 'John' });
})();

Type	Description
`CollectionOptions`	The Collection with all metadata (defaultId, vector, indexing) for the collection.

Drop a collection

Drop (delete) a collection from a database, erasing all data stored in it as well.

Python
TypeScript
Java
cURL

View this topic in more detail on the API Reference.

result = db.drop_collection(name_or_collection="vector_collection")

Calling this method is equivalent to invoking the collection’s own method collection.drop(). In that case, trying to use the object afterwards would result in an API error, as it will have become a reference to a non-existent collection.

Parameters:

Name Type Summary

Name	Type	Summary
name_or_collection	`Union[str, Collection]`	either the name of a collection or a `Collection` instance.
max_time_ms	`Optional[int]`	A timeout, in milliseconds, for the underlying HTTP request.

name_or_collection

Union[str, Collection]

either the name of a collection or a Collection instance.

max_time_ms

Optional[int]

A timeout, in milliseconds, for the underlying HTTP request.

Returns:

Dict - A dictionary in the form {"ok": 1} if the method succeeds.

Example response

{'ok': 1}

Example:

from astrapy import DataAPIClient
client = DataAPIClient("TOKEN")
database = my_client.get_database_by_api_endpoint("01234567-...")

database.list_collection_names()
# prints: ['a_collection', 'my_v_col', 'another_col']
database.drop_collection("my_v_col")  # {'ok': 1}
database.list_collection_names()
# prints: ['a_collection', 'another_col']

View this topic in more detail on the API Reference.

const ok = await db.dropCollection('COLLECTION');

Calling this method is equivalent to invoking the collection’s own method collection.drop(). In that case, trying to use the object afterward would result in an API error, as it will have become a reference to a non-existent collection.

Parameters:

Name Type Summary

Name	Type	Summary
name	`string`	The name of the collection to delete.
options?	`DropCollectionOptions`	Allows you to override the namespace & set a `maxTimeMs`.

name

string

The name of the collection to delete.

options?

DropCollectionOptions

Allows you to override the namespace & set a maxTimeMs.

Options (DropCollectionOptions):

Name Type Summary

Name	Type	Summary
namespace?	`string`	The namespace containing the collection. If not specified, the database’s working namespace is used.
maxTimeMS?	`number`	Maximum time in milliseconds the client should wait for the operation to complete.

namespace?

string

The namespace containing the collection. If not specified, the database’s working namespace is used.

maxTimeMS?

number

Maximum time in milliseconds the client should wait for the operation to complete.

Returns:

Promise<boolean> - A promise that resolves to true if the collection was dropped successfully.

Example:

import { DataAPIClient } from '@datastax/astra-db-ts';

// Get a new Db instance
const db = new DataAPIClient('TOKEN').db('API_ENDPOINT');

(async function () {
  // Uses db's default namespace
  const success1 = await db.dropCollection('users');
  console.log(success1); // true

  // Overrides db's default namespace
  const success2 = await db.dropCollection('users', {
    namespace: 'NAMESPACE'
  });
  console.log(success2); // true
})();

To access the Javadoc on those methods consult the Database Javadoc.

// Given db Database object, list all collections
void db.dropCollection("collectionName");

Parameters:

Name Type Summary

Name	Type	Summary
`collectionName`	`String`	The name of the collection to delete.

collectionName

String

The name of the collection to delete.

Example:

package com.datastax.astra.client.database;

import com.datastax.astra.client.Database;

public class DropCollection {
  public static void main(String[] args) {
    Database db = new Database("API_ENDPOINT", "TOKEN");

    // Delete an existing collection
    db.dropCollection("collection_vector2");
  }
}

curl -s --location \
--request POST ${ASTRA_DB_API_ENDPOINT}/api/json/v1/${ASTRA_DB_KEYSPACE} \
--header "Token: ${ASTRA_DB_APPLICATION_TOKEN}" \
--header "Content-Type: application/json" \
--header "Accept: application/json" \
--data '{
  "deleteCollection": {
    "name": "vector_collection"
  }
}' | json_pp

Response

{
    "status": {
        "ok": 1
    }
}

Parameter:

Name Type Summary

Name	Type	Summary
`name`	`String`	The name of the collection to delete.

name

String

The name of the collection to delete.

Next steps

See the Documents reference topic.

Collections reference

Create a collection

The defaultId option

The indexing option

Pros and cons of selective indexing

Indexing allow example

Indexing deny example

Indexing wildcard examples

List all collections

List collection names

Get a collection

Drop a collection

Next steps

Was this helpful?

Give Feedback