Export metrics via Astra DB console

Enterprises depend on the ability to view database health metrics in centralized systems along with their other software metrics. The Astra DB Metrics feature lets you forward Astra DB database health metrics to an external third-party metrics system. We refer to the recipient of the exported metrics as the destination system.

Introduction

The functionality provided by the Astra DB Metrics feature is often referred to as:

  • Observability

  • External monitoring

  • Third-party metrics

  • Prometheus monitoring integration

At this time, Astra DB Metrics supports exporting health metrics from Astra DB serverless databases to:

You can also use Grafana or Grafana Cloud to visualize the exported metrics.

Metrics UI and API options

You can configure the export of Astra DB metrics via Astra DB console (described in this topic), or via the DevOps API.

Feature availability

The Astra DB Metrics feature:

  • Is available only for Astra DB serverless databases.

  • Is available only on a paid pricing plan, such as Pay As You Go (PAYG) or Enterprise.

  • Is not available on the Astra DB Free plan.

  • Free plan users: click the Chat icon and ask the DataStax representative about options to upgrade your organization.

Benefits

The Astra DB Metrics feature allows you to take full control of forwarding Astra DB database health metrics to your preferred observability system. The functionality is intended for developers, site reliability engineers (SREs), IT managers, and product owners.

Ingesting database health metrics into your system gives you the ability to craft your own alerting actions and dashboards based on your service level objectives and retention requirements. While you can continue to view metrics displayed in Astra DB console via each database’s Health tab, forwarding metrics to a third-party app gives you a more complete view of all metrics being tracked, across all your products.

This enhanced capability can provide your team with broader insights into historical performance, issues, and areas for improvement.

The exported Astra DB health metrics are nearly real-time when consumed externally. You can find the source-of-truth view of your metric values in the Astra DB console’s Health dashboard.

Prerequisites

  1. If you haven’t already, create a serverless database using the Astra DB console.

  2. Ensure you have an admin permission to view and use the Export Metrics UI, which is under Settings for each database. See Roles and permissions in this topic.

You’ll need an existing destination system to receive the forwarded Astra DB metrics. Currently, Prometheus, Apache Kafka, Confluent Kafka, and Grafana / Grafana Cloud are supported.

Pricing

With an Astra DB PAYG or Enterprise plan, there is no additional cost to using Astra DB Metrics, outside of standard data transfer charges. Exporting third-party metrics is not available on the Astra DB Free Tier.

Metrics monitoring may incur costs at the destination system. Consult the destination system’s documentation for its pricing information.

Roles and permissions

The following Astra DB roles can export third-party metrics:

  • Organization Administrator (recommended)

  • Database Administrator

  • Service Account Administrator

  • User Administrator

The required db-manage-thirdpartymetrics permission is automatically assigned to those roles.

If you create a custom role in Astra DB, be sure to assign db-manage-thirdpartymetrics permission to the custom role.

Database metrics forwarded by Astra DB

Here’s a list of database metrics forwarded by the Astra DB Metrics feature.

  • rate_limited_requests_total - A counter that shows the number of failed operations due to an Astra DB rate limit. You can request that rate limits are increased for your Astra DB databases. Take a rate, such as 5 minutes (5m), and alert if the value is > 0.

  • read_requests_failures_total - A counter that shows the number of failed reads. Cassandra drivers retry failed operations, but significant failures can be problematic. Take a rate, such as 5m, and alert if the value is > 0. Warn alert on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

  • read_requests_timeouts_total - Timeouts happen when operations against the database take longer than the server side timeout. Take a rate, such as 5m, and alert if the value is > 0.

  • read_requests_unavailables_total - This total occurs when the service is not available to complete a specific request. Take a rate, such as 5m, and alert if the value is > 0.

  • write_requests_failures_total - A counter that shows the number of failed writes. Cassandra drivers retry failed operations, but significant failures can be problematic. Take a rate, such as 5m, and alert if the value is > 0. Warn alert on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

  • write_requests_timeouts_total - Timeouts occur when operations take longer than the server side timeout. Take a rate, such as 5m, and compare with write_requests_failures_total.

  • write_requests_unavailables_total - Unavailable errors occur when the service is not available to service a particular request. Take a rate, such as 5m, and compare with write_requests_failures_total.

  • range_requests_failures_total - A counter that shows the number of range reads that failed. Cassandra drivers retry failed operations, but significant failures can be problematic. Take a rate, such as 5m, and alert if the value is > 0. Warn alter on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

  • range_requests_timeouts_total - Timeouts are a subset of total failures. Use this metric to understand if failures are due to timeouts. Take a rate, such as 5m, and compare with range_requests_failures_total.

  • range_requests_unavailables_total - Unavailable errors are a subset of total failures. Use this metric to understand if failures are due to timeouts. Take a rate, such as 5m, and compare with range_requests_failures_total.

  • write_latency_seconds_count - Take rate for write throughput. Alert based on your application Service Level Objective (business requirement).

  • write_latency_seconds_bucket - Take percentiles write for latency. Alert based on your application Service Level Objective (business requirement).

  • write_requests_mutation_size_bytes_bucket - Take percentiles to see how big your writes are over time.

  • read_latency_seconds_count - Take the rate for read throughput. Alert based on your application Service Level Objective (business requirement).

  • read_latency_seconds_bucket - Take percentiles read for latency. Alert based on your application Service Level Objective (business requirement).

  • range_latency_seconds_count - Take the rate for range read throughput. Alert based on your application Service Level Objective (business requirement).

  • range_latency_seconds_bucket - Take percentiles range read for latency. Alert based on your application Service Level Objective (business requirement).

Prometheus setup at the destination

For information about setting up Prometheus itself as the destination of the forwarded Astra DB database metrics, see the Prometheus Getting Started documentation.

  • Minimum version: Prometheus v2.25

  • Recommended versions: Prometheus v2.33+

For Prometheus, remote-write-receiver must be enabled in the destination app. For the steps, see:

After completing those steps in your Prometheus environment, verify it by sending a POST request to the remote write endpoint. For an example test client, which also verifies that ingress is setup properly, see:

promremote is a Prometheus remote write client written in Go.

For more information about Prometheus metric types, see this topic.

Kafka setup at the destination

For information about setting up Kafka as a destination of the forwarded Astra DB database metrics, see:

Using Export Metrics in Astra DB console

The configuration steps depend on which destination you’ll use. Currently we support Prometheus remote_write, and Kafka destinations.

To ensure that metrics are enabled for your destination app, provide the relevant properties.

Each update to the metrics configuration in Astra DB console (and/or in the DevOps API) replaces any existing configuration.

  1. After logging into Astra DB console, navigate to your serverless database in the dashboard or create a new one.

  2. If using the serverless database, ensure it is in a Ready status and click the Settings tab.

  3. Scroll down to the Export Metrics section. The initial view:

    Astra DB Metrics initial export form.

  4. Click Add Destination.

  5. Select a destination; currently, Kafka or Prometheus. (For a given database, you can export metrics to just one destination at a time.)

  6. If you selected Kafka, the properties to enter on its form are:

    1. SASL Mechanism - your Kafka Simple Authentication and Security Layer (SASL) mechanism for authentication and data security. Possible value, one of: GSSAPI, PLAIN, SCRAM-SHA-256, or SCRAM-SHA-512. For background information, see the Confluent Kafka - Authentication Methods Overview documentation.

    2. SASL Username - Existing username for Kafka authentication.

    3. SASL Password - Existing password for Kafka authentication.

    4. Topic - Kafka topic to which Astra DB will export the metrics; you must create this topic on your server(s).

    5. Bootstrap Servers - One or more Kafka Bootstrap Server entries. Example: pkc-9999e.us-east-1.aws.confluent.cloud:9092

    6. (Optional) Kafka Security Protocol - Most Kafka installations will not require this setting for Astra DB Metrics to connect. Users of hosted Kafka on Confluent Cloud, though, may need to set SASL_SSL in this Security Protocol property. Valid options are:

      • SASL_PLAINTEXT - SASL authenticated, non-encrypted channel.

      • SASL_SSL - SASL authenticated, encrypted channel. Non-Authenticated options (SSL and PLAINTEXT) are not supported.

        Be sure to specify the appropriate, related SASL Mechanism property. For Confluent Cloud, you may only be able to use PLAIN. See the Confluent Cloud security tutorial. From the Confluent docs: "Confluent Cloud uses SASL/PLAIN (or PLAIN) over TLS v1.2 encryption for authentication because it offers broad client support while providing a good level of security. The usernames and passwords used in the SASL exchange are API keys and secrets that should be securely managed using a secrets store and rotated periodically."

    7. Example of a completed form:

      Astra DB Metrics Kafka form is ready.

    8. When you’re ready with the Kafka form’s entries, click Add Destination.

  7. If you selected Prometheus on the initial metrics destination page, the Prometheus properties you enter on its form depend first on whether you select Basic or Bearer as the Prometheus Strategy (the auth type). Example:

    Astra DB Metrics Prometheus Strategy on initial destination form.

    1. If you chose Bearer from the menu, provide your Prometheus Token value and Prometheus Endpoint on the resulting form. Notice that the form does not display username/password properties for a Prometheus strategy of Bearer.

    2. If you chose Basic from the form’s menu, provide your Prometheus Username, Password, and Endpoint on the resulting form. Notice that the form does not display a Token property for a Prometheus strategy of Basic.

Example form when your Prometheus Strategy is Bearer:

Astra DB Metrics Prometheus bearer form including Token property.

Example form when your Prometheus Strategy is Basic:

Astra DB Metrics Prometheus basic form including username and password.

When you’ve completed the Prometheus form’s entries, click Add Destination.

After adding the metrics destination

After you add a Kafka or Prometheus destination, a confirmation message appears and the Export Metrics UI under Settings shows the destination. Example:

Astra DB Metrics added destination.

If the configuration’s settings are valid, Astra DB exports the health metrics for the specified database. See the next section for an example of using Grafana Cloud to visualize the exported metrics.

If needed, you can click the three vertical dots for options to Modify the destination’s configuration, or Delete the destination. Example:

Astra DB Metrics added destination options.

Modifying an existing destination allows you to edit the configuration’s properties, if necessary. Deleting an existing destination’s configuration in Astra DB would then allow you to try again, or to add a new type of destination, such as switching from Kafka to Prometheus. For a given Astra DB database, you can only configure the export of metrics to one destination at a time.

If you decide to delete a metrics destination, Astra DB displays a message with an alternative option to Update (rather than Delete) the destination’s configuration. Example:

metrics delete destination

Visualize exported Astra DB metrics with Grafana Cloud

You can configure Grafana Cloud to consume Astra DB serverless health metrics.

The detailed steps involve setup using Grafana Cloud, and the DataStax DevOps v2 API. See this Grafana Cloud section of the "Export Metrics via DevOps API" topic.

Once configured, you can use your own Grafana Cloud instance to monitor the Astra DB database’s health via its metrics.

Using Grafana Cloud is optional. You can choose your favorite tool to visualize the Astra DB metrics that you exported to Prometheus or Kafka.

We’ll use Prometheus as the destination system in the examples. You’ll need a Grafana Cloud account. They offer a Free plan with 14-day retention. See Grafana pricing.