Autoscaling and High Availability to improve Superset Stability - AWS/GCP

The goal is to create a default deployment that will be suitable for running production workloads for most uses. To achieve this we should have some level of fault tolerance/High availability and the ability to automatically scale to match user demand. The deployment should also be supported on AWS, GCP and Azure without major changes.

Prerequistes:

Kubernetes cluster with at least two nodes in separate availability zones

High Availability:

The Superset architecture is made up of the following components:

Web server (Gunicorn, Nginx, Apache),
Metadata database engine (PostgreSQL, MySQL, MariaDB),
Scheduler (Celery beat)
Worker (Celery worker) for asynchronous processing
Message queue - Celery broker (Redis, RabbitMQ, SQS, etc.),
Results backend - Celery backend (Redis, S3, Memcached, etc.),
Caching layer (Redis, Memcached, etc.)

Each components can be configured in a fault tolerant way, except the Celery beat scheduler, since only one instance can be running at a time. This does not impact high availability, since kubernetes will relaunch the POD in case of a failure and it is only needed to run scheduled tasks, such as Reports and Alerts.

Autoscaling:

Horizontal pod autoscaling should be configured for both web server and workers. Need to confirm if this is needed for Redis.

Cluster auto scaling should also be configured and tested, but Core Assets team would likely only provide this as guidance, since this would typically be managed by the infrastructure team.

Attach files

Enter a subject

Please enter your email address

RELATED IDEAS

Autoscaling and High Availability to improve Superset Stability - AWS/GCP