Export metrics via DevOps API

Enterprises depend on the ability to view database health metrics in centralized systems along with their other software metrics. The Astra DB Metrics feature lets you forward Astra DB database health metrics to an external third-party metrics system. We refer to the recipient of the exported metrics as the destination system.

Introduction

The functionality provided by the Astra DB Metrics feature is often referred to as:

  • Observability

  • External monitoring

  • Third-party metrics

  • Prometheus monitoring integration

At this time, Astra DB Metrics supports exporting health metrics from Astra DB serverless databases to:

You can also use Grafana or Grafana Cloud to visualize the exported metrics.

Metrics API and UI options

You can configure the export of Astra DB metrics via the DevOps API (described in this topic), or via Astra DB console.

Feature availability

The Astra DB Metrics feature:

  • Is available only for Astra DB serverless databases.

  • Is available only on a paid pricing plan, such as Pay As You Go (PAYG) or Enterprise.

  • Is not available on the Astra DB Free plan.

  • Free plan users: click the Chat icon and ask the DataStax representative about options to upgrade your organization.

Benefits

The Astra DB Metrics feature allows you to take full control of forwarding Astra DB database health metrics to your preferred observability system. The functionality is intended for developers, site reliability engineers (SREs), IT managers, and product owners.

Ingesting database health metrics into your system gives you the ability to craft your own alerting actions and dashboards based on your service level objectives and retention requirements. While you can continue to view metrics displayed in Astra DB console via each database’s Health tab, forwarding metrics to a third-party app gives you a more complete view of all metrics being tracked, across all your products.

This enhanced capability can provide your team with broader insights into historical performance, issues, and areas for improvement.

The exported Astra DB health metrics are nearly real-time when consumed externally. You can find the source-of-truth view of your metric values in the Astra DB console’s Health dashboard.

Prerequisites

  1. If you haven’t already, create a serverless database using the Astra DB console.

    Keep track of your databaseId. You’ll specify it in the DevOps POST API call for /v2/databases/{databaseId}/telemetry/metrics. You can find the databaseId on the Astra DB console’s dashboard.

    Example:

    Astra DB console dashboard shows database ID.

  2. Generate an application token so you can authenticate your account in the DevOps API.

    If you don’t have a current token, see Manage application tokens.

    Example:

    Select Create New Token from database entry’s 3 dots on Astra DB dashboard.

    When using the DevOps API, pass in the auth token’s value in the call’s Header.

  3. Ensure you have permission to use the DevOps v2 API for enabling third-party metrics. See Roles and permissions in this topic.

You’ll need an existing destination system to receive the forwarded Astra DB metrics. Currently, Prometheus, Apache Kafka, Confluent Kafka, and Grafana / Grafana Cloud are supported.

Pricing

With an Astra DB PAYG or Enterprise plan, there is no additional cost to using Astra DB Metrics, outside of standard data transfer charges. Exporting third-party metrics is not available on the Astra DB Free Tier.

Metrics monitoring may incur costs at the destination system. Consult the destination system’s documentation for its pricing information.

Roles and permissions

The following Astra DB roles can export third-party metrics:

  • Organization Administrator (recommended)

  • Database Administrator

  • Service Account Administrator

  • User Administrator

The required db-manage-thirdpartymetrics permission is automatically assigned to those roles.

If you create a custom role in Astra DB, be sure to assign db-manage-thirdpartymetrics permission to the custom role.

Database metrics forwarded by Astra DB

Here’s a list of database metrics forwarded by the Astra DB Metrics feature.

  • rate_limited_requests_total - A counter, it’s the number of operations that failed due to an Astra DB rate limit. You can request that rate limits are increased for your Astra DB databases. Take a rate, such as 5 minutes (5m), and alert if the value is > 0.

  • read_requests_failures_total - A counter, it’s the number of reads that failed. Cassandra drivers will retry failed operations, but significant failures can be problematic. Take a rate, such as 5m, and alert if the value is > 0. Warn alert on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

  • read_requests_timeouts_total - Timeouts happen when operations against the database take longer than the server side timeout. Take a rate, such as 5m, and alert if the value is > 0.

  • read_requests_unavailables_total - Occurs when the service is not available to complete a specific request. Take a rate, such as 5m, and alert if the value is > 0.

  • write_requests_failures_total - A counter, it’s the number of writes that failed. Cassandra drivers will retry failed operations, but significant failures can be problematic. Take a rate, such as 5m, and alert if the value is > 0. Warn alert on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

  • write_requests_timeouts_total - Timeouts occur when operations take longer than the server side timeout. Take a rate, such as 5m, and compare with write_requests_failures_total.

  • write_requests_unavailables_total - Unavailable errors occur when the service is not available to service a particular request. Take a rate, such as 5m, and compare with write_requests_failures_total.

  • range_requests_failures_total - A counter, it’s the number of range reads that failed. Cassandra drivers retry failed operations, but significant failures can be problematic. Take a rate, such as 5m, and alert if the value is > 0. Warn alter on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

  • range_requests_timeouts_total - Timeouts are a subset of total failures. Use this metric to understand if failures are due to timeouts. Take a rate, such as 5m, and compare with range_requests_failures_total.

  • range_requests_unavailables_total - Unavailable errors are a subset of total failures. Use this metric to understand if failures are due to timeouts. Take a rate, such as 5m, and compare with range_requests_failures_total.

  • write_latency_seconds_count - Take rate for write throughput. Alert based on your application Service Level Objective (business requirement).

  • write_latency_seconds_bucket - Take percentiles write for latency. Alert based on your application Service Level Objective (business requirement).

  • write_requests_mutation_size_bytes_bucket - Take percentiles to see how big your writes are over time.

  • read_latency_seconds_count - Take the rate for read throughput. Alert based on your application Service Level Objective (business requirement).

  • read_latency_seconds_bucket - Take percentiles read for latency. Alert based on your application Service Level Objective (business requirement).

  • range_latency_seconds_count - Take the rate for range read throughput. Alert based on your application Service Level Objective (business requirement).

  • range_latency_seconds_bucket - Take percentiles range read for latency. Alert based on your application Service Level Objective (business requirement).

Prometheus setup at the destination

For information about setting up Prometheus itself as the destination of the forwarded Astra DB database metrics, see the Prometheus Getting Started documentation.

  • Minimum version: Prometheus v2.25

  • Recommended versions: Prometheus v2.33+

For Prometheus, remote-write-receiver must be enabled in the destination app. For the steps, see:

After completing those steps in your Prometheus environment, verify it by sending a POST request to the remote write endpoint. For an example test client, which also verifies that ingress is setup properly, see:

promremote is a Prometheus remote write client written in Go.

For more information about Prometheus metric types, see this topic.

Kafka setup at the destination

For information about setting up Kafka as a destination of the forwarded Astra DB database metrics, see:

Configure the POST payload

Use the following POST to export metrics to an external system:

POST /v2/databases/{databaseId}/telemetry/metrics

The configuration payload (JSON) depends on which destination you’ll use. Currently we support Prometheus remote_write, and Kafka destinations.

To ensure that metrics are enabled for your destination app, provide the relevant properties.

Each POST replaces any existing configuration.

See the following sections for curl examples. If you prefer, use Postman with raw JSON in the body.

In the --header, use Bearer and specify your token ID to authenticate with the DevOps v2 API.

If you don’t have a current token, see Manage application tokens.

Specify your database ID.

See the Astra DB console Dashboard for its value. You can define a variable such as $DB_ID, set it to your databaseId value, and then use the variable in a curl command.

In the request payload, specify the destination’s --data properties.

Astra DB Metrics configuration for Prometheus

With a required top-level key of prometheus_remote, the POST payload:

  • prometheus_remote

    • endpoint

    • auth_strategy

    • token

    • user

    • password

For auth_strategy, specify basic or bearer, depending on your Prometheus remote_write auth type.

  • If you specified "auth_strategy": "bearer", provide your Prometheus token. Do not include user or password in the POST request payload.

  • If you specified "auth_strategy": "basic", provide your Prometheus user and password. Do not include token.

Example payloads:

{
    "prometheus_remote":  {
        "endpoint": "https://prometheus.example.com/api/prom/push",
        "auth_strategy" : "bearer",
        "token" : "lSAYp9oLtdAa9ajasoNNS999"
    }
}

Or:

{
    "prometheus_remote":  {
        "endpoint": "https://prometheus.example.com/api/prom/push",
        "auth_strategy" : "basic",
        "password" : "myPromPassword",
        "user" : "myPromUsername"
    }
}

For Prometheus, remote-write-receiver must be enabled in the destination system. See:

POST metrics configuration examples (Prometheus)

In the curl examples, notice we added the --include option to specify that the POST output should include the HTTP response headers. These details can help with diagnostics if the POST returns an error or a null response.

  • cURL command

  • Result

curl --request POST \
  --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <application_token>' \
  --include \
  --data '{
           "prometheus_remote": {
             "endpoint": "Enter a full HTTP or HTTPS adddress and path for prometheus endpoint",
             "auth_strategy": "bearer or basic",
             "token": "If auth_strategy bearer, enter Prom Remote Write auth token",
             "user": "If auth_strategy basic, enter Prom username",
             "password": "If auth_strategy basic, enter Prom password"
            }
          }'

Examples with POST request payload:

curl --request POST \
  --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <application_token>' \
  --include \
  --data '{
           "prometheus_remote": {
             "endpoint": "https://prometheus.example.com/api/prom/push",
             "auth_strategy": "bearer",
             "token": "lSAYp9oLtdAa9ajasoNNS999"
            }
          }'

Or:

curl --request POST \
  --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <application_token>' \
  --include \
  --data '{
           "prometheus_remote": {
             "endpoint": "https://prometheus.example.com/api/prom/push",
             "auth_strategy": "basic",
             "user": "myPromUsername",
             "password": "myPromPassword"
            }
          }'
202 OK

Or one of the following:

400 Bad request.
401 Unauthorized.
403 The user is forbidden to perform the operation.
404 The specified resource was not found.
409 The request could not be processed because of conflict.
5XX A server error occurred.

Example:

{
  "errors": [
    {
      "description": "The name of the environment must be provided",
      "internalCode": "a1012",
      "internalTxId": "103B-A018-3898-0ABF"
    }
  ]
}

Get metrics configuration examples (Prometheus)

In the curl examples, notice we added the --include option to specify that the GET output should include the HTTP response headers. These details can help with diagnostics if the GET returns an error or a null response.

Retrieve third-party metrics configuration for an Astra DB database:

  • cURL command

  • Result

curl --request GET \
  --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <application_token>' \
  --include
200 OK

Example:

{
    "prometheus_remote": {
        "endpoint": "https://prometheus.example.com/api/prom/push",
        "auth_strategy": "basic",
        "user": "myPromUsername",
        "password": "myPromPassword"
  }
}

Or one of the following:

400 Bad request.
403 The user is forbidden to perform the operation.
404 The specified resource was not found.
500 A server error occurred.

{
  "errors": [
    {
      "description": "The name of the environment must be provided",
      "internalCode": "a1012",
      "internalTxId": "103B-A018-3898-0ABF"
    }
  ]
}

Astra DB Metrics configuration for Kafka

With a required top-level key of kafka, the POST payload’s required properties are:

  • bootstrap_servers

  • topic

  • sasl_mechanism

  • sasl_username

  • sasl_password

Example payload for Kafka:

{
  "kafka": {
    "bootstrap_servers": [
      "pkc-9999e.us-east-1.aws.confluent.cloud:9092"
    ],
    "topic": "astra_metrics_events",
    "sasl_mechanism": "PLAIN",
    "sasl_username": "9AAAAALPRC9AAAAA",
    "sasl_password": "viAAr/geQxxacrAAmydHb7wz6DRu6mL9W9999juQcS1s++pECM99mnW+3Gs06xDd",
    "security_protocol": "SASL_PLAINTEXT"
  }
}

The security_protocol property is an advanced option, and is not required. Most Kafka installations will not require this setting for Astra DB Metrics to connect. Users of hosted Kafka on Confluent Cloud, though, may need to set 'SASL_SSL' in the security_protocol property. Valid options are:

  • SASL_PLAINTEXT - SASL authenticated, non-encrypted channel.

  • SASL_SSL - SASL authenticated, encrypted channel. Non-Authenticated options (SSL and PLAINTEXT) are not supported.

Be sure to specify the appropriate, related sasl_mechanism property. For Confluent Cloud, you may only be able to use PLAIN. See the Confluent Cloud security tutorial. From the Confluent docs: "Confluent Cloud uses SASL/PLAIN (or PLAIN) over TLS v1.2 encryption for authentication because it offers broad client support while providing a good level of security. The usernames and passwords used in the SASL exchange are API keys and secrets that should be securely managed using a secrets store and rotated periodically."

POST metrics configuration example (Kafka)

In the curl examples, notice we added the --include option to specify that the POST output should include the HTTP response headers. These details can help with diagnostics if the POST returns an error or a null response.

  • cURL command

  • Result

curl --request POST \
  --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <application_token>' \
  --include \
  --data '{
      "kafka": {
          "bootstrap_servers": [
              "kafka-0.yourdomain.com:9092"
          ],
          "topic": "astra_metrics_events",
          "sasl_mechanism": "PLAIN",
          "sasl_username": "kafkauser",
          "sasl_password": "kafkapassword",
          "security_protocol": "SASL_PLAINTEXT"
      }
    }'

Example with POST request payload:

curl --request POST \
  --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <application_token>' \
  --include \
  --data '{
          "kafka": {
            "bootstrap_servers": [
              "pkc-9999e.us-east-1.aws.confluent.cloud:9092"
          ],
          "topic": "astra_metrics_events",
          "sasl_mechanism": "PLAIN",
          "sasl_username": "9AAAAALPRC9AAAAA",
          "sasl_password": "viAAr/geQxxacrAAmydHb7wz6DRu6mL9W9999juQcS1s++pECM99mnW+3Gs06xDd",
          "security_protocol": "SASL_PLAINTEXT"
  }
}'
202 OK

Or one of the following:

400 Bad request.
401 Unauthorized.
403 The user is forbidden to perform the operation.
404 The specified resource was not found.
409 The request could not be processed because of conflict.
5XX A server error occurred.

Example:

{
  "errors": [
    {
      "description": "The name of the environment must be provided",
      "internalCode": "a1012",
      "internalTxId": "103B-A018-3898-0ABF"
    }
  ]
}

Get metrics configuration examples (Kafka)

In the curl examples, notice we added the --include option to specify that the GET output should include the HTTP response headers. These details can help with diagnostics if the GET returns an error or a null response.

Retrieve third-party metrics configuration for an Astra DB database:

  • cURL command

  • Result

curl --request GET \
  --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
  --header 'Accept: application/json' \
  --header 'Authorization: Bearer <application_token>' \
  --include
200 OK

Example:

{
    "kafka": {
      "bootstrap_servers": [
         "kafka-0.yourdomain.com:9092"
      ],
      "topic": "astra_metrics_events",
      "sasl_mechanism": "PLAIN",
      "sasl_username": "kafkauser",
      "sasl_password": "kafkapassword",
      "security_protocol": "SASL_PLAINTEXT"
    }
}

Or one of the following:

400 Bad request.
403 The user is forbidden to perform the operation.
404 The specified resource was not found.
500 A server error occurred.

{
  "errors": [
    {
      "description": "The name of the environment must be provided",
      "internalCode": "a1012",
      "internalTxId": "103B-A018-3898-0ABF"
    }
  ]
}

Visualize exported Astra DB metrics with Grafana Cloud

This section explains how to configure Grafana Cloud to consume Astra DB (serverless) health metrics.

Using Grafana Cloud is optional. You can choose your favorite tool to visualize the Astra DB metrics that you exported to Prometheus or Kafka.

We’ll use Prometheus as the destination system in the examples. You’ll need a Grafana Cloud account. They offer a Free plan with 14-day retention. See Grafana pricing.

Initial steps in Grafana Cloud

The following initial steps occur before submitting the POST /v2/telemetry/metrics payload described previously in this topic.

  1. On login to Grafana Cloud, select + Connect data from the home page.

    Grafana Cloud Welcome page has + Connect data button.

  2. Select the Custom Prometheus metrics section that includes the Prometheus icon.

    Grafana Cloud Select Custom Prom metrics.

  3. You can accept the default selections, or make edits as needed. Provide a name to the API Key (such as AstraDB_PS) and click Create API Key.

    Grafana Cloud Create API key option is shown.

  4. The config file is generated. Here’s an example - your values will be different:

    cat << EOF > ./agent-config.yaml
    global:
      scrape_interval: 60s
    
    scrape_configs:
      - job_name: node
        static_configs:
        - targets: ['localhost:9100']
    
    remote_write:
      - url: https://prometheus-prod-10-prod-us-central-0.grafana.net/api/prom/push
        basic_auth:
          username: 412XXX
          password: eyJrIjoiMmE1ZTY4YWRhY2ZmNmZlMjllZmY3ZjczYWQ0NzRiZjNlNTE1NTVkMCIsIm4iOiJBc3RyYURCX1BTIiwiaWQiOjYzOTQXXX=
    EOF

DevOps config via Postman & Grafana Cloud followup

To configure and publish metrics from Astra DB using the DevOps API, follow these steps. We’ll use Postman and have a bearer token configured.

To publish metrics, create a POST request in Postman:

https://api.astra.datastax.com/v2/databases/{databaseId}/telemetry/metrics

In the Body, set the parameters to the values that you retrieved from Grafana Cloud. Example:

{
  "prometheus_remote": {
    "endpoint": "https://prometheus-prod-10-prod-us-central-0.grafana.net/api/prom/push",
    "auth_strategy": "basic",
    "user": "412XXX",
    "password": "eyJrIjoiMmE1ZTY4YWRhY2ZmNmZlMjllZmY3ZjczYWQ0NzRiZjNlNTE1NTVkMCIsIm4iOiJBc3RyYURCX1BTIiwiaWQiOjYzOTQXXX=
  }
}

The POST response should return a 202 on success.

Now, switch back to Grafana Cloud:

  1. Select the option to Create a New Dashboard.

    Grafana Cloud create new dashboard

  2. Select Add a new panel and select the Data Source as grafanacloud-<YourUserId>-prom. Example:

    Data source is selected as grafanacloud-<YourUserId>-prom.

  3. If configured correctly, you should see the Astra DB Metrics under the Metrics Browser in Grafana Cloud. Example:

    Grafana Metrics Browser shows Astra DB metrics.

  4. Now you can select the metrics that you want to visualize in Grafana Cloud. The Dashboard panel displays the charts.

Alternative approach: import from Astra DB Health to Grafana Cloud

This alternative approach will explore an import option from Astra DB health to your Grafana Cloud instance. You will still need to complete the steps listed above:

  • Initial steps in Grafana Cloud

  • DevOps config via Postman & Grafana Cloud followup

Then continue with the steps below.

  1. Login to Astra DB console.

  2. Select the database you want to ultimately monitor in Grafana Cloud by first navigating to your database’s Health tab.

  3. Click on DSE Cluster Condensed. Example:

    Example shows Astra DB Health screen for DSE Cluster Condensed.

  4. Click the Share icon:

    Astra DB console Health Share

  5. Then select the Export tab:

    Astra DB console Health Export

  6. Click View JSON and then Copy to Clipboard:

    Astra DB console Health Copy to Clipboard

  7. Make the following edits to the copied JSON.

    Replace all references to coordinator_…​{tenant= …​ } with astra_coordinator and remove the tenant references. For example, in the following expression:

    ​​"expr": "histogram_quantile(.99, sum(rate(coordinator_write_requests_mutation_size_bytes_bucket{tenant='${__user.login}'}[$__rate_interval])) by (le))",

    You would replace that expression with:

    ​​"expr": "histogram_quantile(.99, sum(rate(astra_coordinator_write_requests_mutation_size_bytes_bucket{}[$__rate_interval])) by (le))",
  8. Now switch over to your Grafana Cloud instance. Click the Create option, and then click Import from the menu.

    Grafana Cloud Create Import option is selected.

  9. Upload or paste in your edited JSON.

    Grafana Cloud Import JSON shown.

  10. You can change the name. Example:

    Grafana Cloud Import JSON change name.

  11. Once imported, all your Astra DB health charts will auto-populate in Grafana Cloud. Example:

    Astra DB Health charts displayed in Grafana Cloud.

Now you can use your own Grafana Cloud instance to monitor the Astra DB database’s health via its metrics.