Deploying Apache Superset on Kubernetes via Ambari
This feature will be included in ODP 1.3.2.0 as a Tech Preview, currently in qualification. It is available for early enterprise testing.
Interested in early access? Contact our team to join the enterprise early access program.
Overview
Apache Superset is an open-source business intelligence platform that provides interactive data exploration, visualization, and dashboarding. In ODP's Kubernetes integration, Ambari deploys Superset 4.1.4 on Kubernetes using the official Superset Helm chart, wired to your ODP data services (Trino, Hive, Impala) and secured with OIDC authentication.
Superset on Kubernetes complements the rest of the ODP stack: your data lives in HDFS or Ozone, governed by Ranger and catalogued by Atlas, queryable via Trino (on Kubernetes) or Hive and Impala (on the cluster). Superset provides the visualization layer on top, with no data duplication.
Superset Helm Chart Managed by Ambari
Ambari deploys Superset via the Superset Helm chart. The deployment includes:
- Superset Web: the main application server (1 or more replicas)
- Superset Worker: Celery worker for async chart rendering and alerts (1 or more replicas)
- Superset Beat: Celery scheduler for periodic tasks
- Redis: in-cluster cache and message broker for Celery (deployed as a subchart)
- PostgreSQL: Superset's internal metadata database — stores dashboards, charts, users, and connections (deployed as a subchart, or you can point to an external PostgreSQL)
All of these are created and managed by Ambari through the Helm chart lifecycle.
Deploying Superset from Ambari
Step 1: Prerequisite — Deploy Trino First
Superset connects to ODP data through SQL engines. We recommend deploying Trino on Kubernetes first, as it provides the broadest query capability against Iceberg and Hive tables. Superset can also connect directly to Hive and Impala using their JDBC/ODBC interfaces.
Step 2: Open the Kubernetes View and Select Superset
In Ambari, navigate to Views > Kubernetes Manager. Click Deploy next to Apache Superset.
Step 3: Configure the Deployment
General tab:
| Setting | Description | Default |
|---|---|---|
| Helm Release Name | Helm release name | superset |
| Namespace | Kubernetes namespace | odp-apps |
| Web Replicas | Number of Superset web pods | 1 |
| Worker Replicas | Number of Celery worker pods | 1 |
| Web CPU Request | CPU request per web pod | 0.5 |
| Web CPU Limit | CPU limit per web pod | 2 |
| Web Memory Request | Memory per web pod | 2Gi |
| Web Memory Limit | Memory per web pod | 4Gi |
| Secret Key | Flask secret key (auto-generated if blank) | auto |
Database tab:
| Setting | Description |
|---|---|
| Use Embedded PostgreSQL | Deploy PostgreSQL as a subchart (recommended for evaluation) |
| External PostgreSQL Host | Host of an external PostgreSQL instance |
| External PostgreSQL Port | Port (default: 5432) |
| External PostgreSQL Database | Database name (e.g., superset) |
| External PostgreSQL User | Database user |
| External PostgreSQL Password | Database password (stored encrypted) |
For production, use an external PostgreSQL instance with proper backup and HA configuration rather than the embedded subchart.
Authentication tab (OIDC):
| Setting | Description |
|---|---|
| Authentication Method | DATABASE (local users) or OIDC |
| OIDC Provider URL | e.g., https://keycloak.example.com/realms/myrealm |
| Client ID | OIDC client ID registered for Superset |
| Client Secret | OIDC client secret (stored encrypted) |
| Allowed Roles | Roles/groups that grant access to Superset |
| Admin Roles | Roles/groups that grant Superset admin |
Ingress tab:
| Setting | Description |
|---|---|
| Enable Ingress | Expose Superset via Kubernetes Ingress |
| Hostname | e.g., superset.example.com |
| TLS Secret | Name of the Kubernetes TLS Secret for HTTPS |
| Ingress Class | Ingress controller class (e.g., nginx) |
Step 4: Submit
Click Deploy. Monitor the deployment in Background Operations. Superset initialization (including database migrations) typically takes 3–8 minutes on first install.
Connecting Superset to ODP Data Sources
After deployment, configure database connections in Superset so users can explore ODP data.
Connecting to Trino (Recommended)
Trino provides the broadest SQL coverage for ODP data, including Iceberg tables, and is the recommended connection for Superset.
In Superset, navigate to Settings > Database Connections > + Database.
Select Trino as the database type and enter the SQLAlchemy connection string:
trino://<user>@trino-coordinator.odp-apps.svc.cluster.local:8080/hive
Since both Superset and Trino run in the same Kubernetes namespace, they can communicate via Kubernetes internal DNS without exposing Trino externally.
Connection parameters:
| Parameter | Value |
|---|---|
| SQLAlchemy URI | trino://superset_svc@trino-coordinator.odp-apps.svc.cluster.local:8080/hive |
| Display Name | ODP - Trino (Iceberg/Hive) |
| Expose in SQL Lab | Enabled |
| Allow DML | Disabled (recommended — Superset should be read-only) |
With Kerberos-secured Trino, the Superset service account needs a Kerberos principal. Ambari provisions this keytab and configures the Superset-to-Trino connection to use it automatically when OIDC is not the sole authentication method.
Connecting to Hive via HiveServer2
For direct Hive connectivity (for Hive-native tables not covered by Trino):
hive://<hiveserver2-host>:10000/default
With Kerberos:
# In Superset config (injected by Ambari Helm values)
SQLALCHEMY_CUSTOM_PASSWORD_STORE = ...
# Connection string with Kerberos params
hive://hiveserver2.example.com:10000/default?auth=KERBEROS&kerberos_service_name=hive
Connecting to Impala
impala://<impala-host>:21050/default
With Kerberos:
impala://impala.example.com:21050/default?auth_mechanism=GSSAPI&kerberos_service_name=impala
Authentication and User Management
Local Database Authentication
By default (when OIDC is not configured), Superset uses its own internal user database. Users are created in Settings > List Users.
Roles in Superset:
- Admin: full platform access, can manage connections and users
- Alpha: can create and edit charts and dashboards
- Gamma: read-only access to granted dashboards
- sql_lab: access to SQL Lab for ad-hoc queries
- Public: anonymous access (if enabled — not recommended for secured environments)
OIDC Authentication
When Ambari deploys Superset with OIDC configured, it injects the following into the Superset superset_config.py:
from flask_appbuilder.security.manager import AUTH_OAUTH
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{
"name": "oidc",
"icon": "fa-openid",
"token_key": "access_token",
"remote_app": {
"client_id": "<client-id>",
"client_secret": "<client-secret>",
"api_base_url": "<provider-url>",
"server_metadata_url": "<provider-url>/.well-known/openid-configuration",
"client_kwargs": {"scope": "openid email profile groups"},
},
}
]
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = "Gamma"
Group-to-role mapping is also configurable, so that LDAP/AD groups are automatically mapped to Superset roles at login time.
Synchronizing with LDAP
Superset can be configured to authenticate against LDAP directly (without OIDC), using the same LDAP server configured in ODP. When Ambari deploys Superset, it can inject LDAP parameters from the ODP cluster configuration (read from the LDAP configuration stored in Ambari).
Creating Dashboards on ODP Data
SQL Lab for Exploration
SQL Lab is Superset's interactive SQL editor. Use it to explore ODP data before building charts:
- Navigate to SQL Lab > SQL Editor.
- Select the ODP - Trino database and the target schema.
- Write and run SQL. Results appear inline.
- Save a query as a Virtual Dataset to use as the basis for a chart.
Example: exploring Iceberg table history via Trino:
SELECT
committed_at,
snapshot_id,
operation,
summary['added-records'] AS added_records,
summary['deleted-records'] AS deleted_records
FROM hive.iceberg_demo."my_table$snapshots"
ORDER BY committed_at DESC
LIMIT 20;
Building Charts
- Navigate to Charts > + Chart.
- Select a dataset (physical table or virtual dataset from SQL Lab).
- Choose a chart type (Bar, Line, Table, Map, etc.).
- Configure dimensions, metrics, and filters.
- Save the chart.
Assembling Dashboards
- Navigate to Dashboards > + Dashboard.
- Drag charts from the chart picker onto the layout grid.
- Configure filters that apply across all charts on the dashboard.
- Set the dashboard refresh interval if you want auto-refresh for near-real-time data.
- Publish the dashboard and share it with the appropriate Superset roles.
Resource Sizing Recommendations
The following sizing recommendations apply to typical ODP deployments. Adjust based on the number of concurrent users and dashboard complexity.
| Scenario | Web Pods | Worker Pods | Web Memory | Worker Memory |
|---|---|---|---|---|
| Development / Evaluation | 1 | 1 | 2Gi | 1Gi |
| Small team (< 20 users) | 1 | 2 | 4Gi | 2Gi |
| Medium team (20–100 users) | 2 | 2 | 4Gi | 2Gi |
| Large team (100+ users) | 3+ | 3+ | 8Gi | 4Gi |
PostgreSQL: allocate at least 5 GB of persistent storage for the Superset metadata database. For teams with many saved charts and dashboards, 20–50 GB is more appropriate.
Redis: the default subchart configuration (256 MB memory limit) is sufficient for most deployments. Increase if you use Superset's alert and report features heavily.
Superset web pods are stateless (session state in Redis). Adding web replicas requires a Kubernetes Ingress or LoadBalancer with sticky sessions, or Redis-backed session storage (configured automatically by Ambari's Helm values).
Monitoring Superset from Ambari
The Kubernetes View shows the following for the Superset deployment:
- Pod status: web, worker, beat, Redis, and PostgreSQL pod states
- Helm release status: current revision and Flux reconciliation status
- Recent events: Kubernetes events for the Superset resources
For Superset-level monitoring, use the built-in Superset > Logs section to see query execution history and errors.
Upgrading Superset
To upgrade to a newer Superset version:
- In the Kubernetes View, select the Superset deployment.
- Click Upgrade.
- Review the configuration diff.
- Click Confirm Upgrade.
Ambari runs database migrations automatically as part of the upgrade (the Helm chart includes an init container for migrations). Existing dashboards and charts are preserved.
If the upgrade fails, use Rollback in the Kubernetes View to return to the previous Helm revision.