DataStax Astra Developer Hub

Welcome to the DataStax Astra Developer Hub. You'll find comprehensive guides and documentation to help you start working with DataStax Astra as quickly as possible. Use the included APIs to create, modify, and terminate databases, and interact with the databases you create. Let's do it!

Docs    API Reference

Loading and unloading data with DataStax Bulk Loader

Use DataStax Bulk Loader (dsbulk) to load and unload data in CSV or JSON format with your DataStax Astra database efficiently and reliably.

You can use dsbulk as a standalone tool to remotely connect to a cluster. The tool is not required to run locally on an instancesinstances - The basic database infrastructure component where you store your data. Commonly referred to as a "node" in Cassandra terminology., but can be used in this configuration.

📘

Tip

The dsbulk command examples often show a parameter such as -url filename.csv or -url filename.json. Optionally, you can load or unload data from/to compressed CSV or JSON files. For details, refer to the --connector.(csv|json).compression option.

Prerequisites

  1. Download dsbulk.
  2. Unpack the distribution to your machine:
    tar -xzfv dsbulk-1.6.0.tar.gz
  3. Connect dsbulk to your Astra databasedatabase - A group of distributed instances for storing data. Each paid Astra database has at least three instances. by including the path to the secure connect bundle, and the username and password entered when creating the database.
    Use the -b option to specify the location of the secure connect bundle. The specified location must be a path on the local filesystem or a valid URL.

📘

Note

If a secure connect bundle is specified, any of the following options are ignored and a warning is logged:

Loading data

Load CSV or JSON data with a dsbulk load command.

Load data from a local file

Load data from a local file export.csv with headers into keyspace ks1 and table table1:

dsbulk load -url export.csv -k ks1 -t table1 -b "path/to/secure-connect-database_name.zip" -u database_user -p database_password -header true

Specify an external data source

dsbulk load -url https://svr/data/export.csv -k ks1 -t table1 -b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

Specify a file with URLs

Specify a file that contains a list of multiple, well-formed URLs for the CSV or JSON data files to load:

dsbulk load --connector.json.urlfile "my/local/multiple-input-data-urls.txt" -k ks1 -t table1 -b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

Load CSV data from stdin

Load CSV data from stdin as it is generated from a loading script generate_data. The data is loaded to the keyspace ks1 and table table1. If not specified, the field names are read from a header rowrow - 1) Columns that have the same primary key. 2) A collection of cells per combination of columns in the storage engine. in the input file.

generate_data | dsbulk load -url stdin:/ -k ks1 -t table1 -b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

Unloading data

Use the dsbulk unload command to unload data from the specified keyspace and table to a CSV or JSON file.

Unload data example

Specify the keyspace ks1 and table table1 from which to unload the data to a CSV file:

dsbulk unload -url myData.csv -k ks1 -t table1 -b "path/to/secure-connect-database_name.zip" -u database_user -p database_password

The -url value can designate a path on the local filesystem or a valid URL.

Updated about 16 hours ago



Loading and unloading data with DataStax Bulk Loader


Use DataStax Bulk Loader (dsbulk) to load and unload data in CSV or JSON format with your DataStax Astra database efficiently and reliably.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.