Astra DB Architecture FAQ
Astra DB is a globally distributed, serverless, multi-model database service built by DataStax to satisfy your needs on your cloud provider. It’s the first and only serverless and multi-region database service based on an open-source NoSQL database, specifically Apache Cassandra®.
The following are frequently asked questions that summarize how Astra DB and Cassandra work together.
DataStax has adapted Cassandra into a multi-tenant database for the serverless needs of Astra users. Our design enables fine-grained, elastic scalability of individual components to meet the capacity demands of modern application workloads.
Cassandra has a proven track record of successfully managing data workloads for some of the most mission-critical, high-throughput, global-scale, always-on, real-time applications.
This scale-all-or-nothing approach means that scaling reads and writes independently from compaction is not an option, and scaling down presents a unique set of challenges to operators.
While you can run DataStax Enterprise (DSE) both on premises and in the cloud, Astra DB allows you to create databases in minutes, with reduced complexity, fewer operational concerns, and an approachable entry point for developing your applications. Astra DB is a true database-as-a-service (DBaaS) offering that provides simplified cluster management with flexible pricing. DSE includes Advanced Workloads (DSE Search, Analytics, and Graph), which are not available in Astra DB. Read this whitepaper for an in-depth understanding of the cloud-native architecture of Astra DB.
Our new Microservices Architecture is designed to be cloud-native and multi-cloud. These services run on top of Kubernetes and support data storage through fully managed object storage services in AWS, Google Cloud, and Azure enable Astra DB to operate in any of the three public clouds.
Astra DB supports creating databases on Amazon Web Services (AWS), Google Cloud, and Microsoft Azure.
Astra DB is composed of many smaller independent services that belong to the control plane, data plane, infrastructure services, and object storage services. Kubernetes is used to scale and orchestrate all services in the data plane and infrastructure services.
The control plane is responsible for maintaining and configuring the data plane based on customer-specified settings and operational status reported by the data plane. Data plane services are responsible for executing user and application data requests, as well as performing required database maintenance operations in the background, which include:
Astra DB K8s Operator
Commit Log Replayer Service
Controller Service for AIOps
Object Storage Services
To have a proper multi-tenant system, there are several factors you must consider:
A balance between effective cost and performance guarantees
A tenant that experiences a sudden increase in traffic, software bug, or denial-of-service attack should not have any substantial effect on other tenants that share the same underlying infrastructure.
In Astra DB, there can be multiple Kubernetes clusters, each serving thousands of tenants. Each tenant is assigned to a single, shared-tenancy Kubernetes cluster, a number of services depending on workload requirements, and one S3, GCS or ABS bucket.
A multi-tenant coordinator or data service is responsible for one or more Cassandra keyspaces and sets of tokens per tenant. Our tenants also share the same metadata service with each tenant metadata being stored separately. Each tenant has a separate bucket in the object storage service where all tenant keyspaces, tables, and data reside.
To assign tenants to different services, Astra DB uses a shuffle sharding algorithm that in practice utilizes the following pattern. Shuffle shards may overlap, such as one or more services may belong to two different shuffle shards, we must introduce additional guarantees about overlapping. In particular, for any two shuffle shards, we want the overlap to be at most one data service.
Shuffle sharding helps with tenant performance and fault isolation. Different clusters can have distinct multi-tenancy policies to accommodate customers with different requirements.
Auto scaling is a hard optimization problem that looks to minimize the total cost of computational resources, while meeting the continuously changing demand of every tenant. Auto scaling enables database service elasticity, which boils down to how many operations per second a tenant currently needs.
Astra DB initially assigns some default rate limits per service per tenant. It then dynamically adjusts those limits up to some predefined maximum allowed values and automatically adds or removes services according to the demand. Auto scaling decisions are made and applied by the controller service for artificial intelligence for IT operations (AIOps) and the Astra DB K8s operator service.
In Astra DB, compaction services use the unified compaction strategy (UCS). It’s a hybrid of Cassandra’s size tiered compaction strategy (STCS), leveled compaction strategy (LCS), and time window compaction strategy (TWCS). Sharding is automatically configurable for different scenarios, including using TWCS to shard time series workloads.
When using Astra DB, compaction services compact data in SSTables and simultaneously repair any inconsistencies among replicas. This becomes possible because a compaction service has access to every-replica data files through an object storage service. A repair during compaction is ideal because the cost of added repair is very low in this scenario. Repairing while compacting is significantly faster than running compaction and repair separately.
In Astra DB, object storage services are responsible for data storage, which is separate from compute. When a new tenant is assigned to a data service:
There is no need to stream replica data for the tokens that the service is managing.
Data still resides in the object storage and does not need to be moved.
This feature is analogous to creating a new pointer to an existing data set rather than copying the whole data set to create a new one. The whole process is very efficient. Data services do have read caches that get gradually hydrated to speed up read requests.