Version: 1.3.1.0

Deploying Apache Superset on Kubernetes via Ambari

Tech Preview — ODP 1.3.2.0

This feature will be included in ODP 1.3.2.0 as a Tech Preview, currently in qualification. It is available for early enterprise testing.

Interested in early access? Contact our team to join the enterprise early access program.

Overview

Apache Superset is an open-source business intelligence platform that provides interactive data exploration, visualization, and dashboarding. In ODP's Kubernetes integration, Ambari deploys Superset 4.1.4 on Kubernetes using the official Superset Helm chart, wired to your ODP data services (Trino, Hive, Impala) and secured with OIDC authentication.

Superset on Kubernetes complements the rest of the ODP stack: your data lives in HDFS or Ozone, governed by Ranger and catalogued by Atlas, queryable via Trino (on Kubernetes) or Hive and Impala (on the cluster). Superset provides the visualization layer on top, with no data duplication.

Superset Helm Chart Managed by Ambari

Ambari deploys Superset via the Superset Helm chart. The deployment includes:

Superset Web: the main application server (1 or more replicas)
Superset Worker: Celery worker for async chart rendering and alerts (1 or more replicas)
Superset Beat: Celery scheduler for periodic tasks
Redis: in-cluster cache and message broker for Celery (deployed as a subchart)
PostgreSQL: Superset's internal metadata database — stores dashboards, charts, users, and connections (deployed as a subchart, or you can point to an external PostgreSQL)

All of these are created and managed by Ambari through the Helm chart lifecycle.

Deploying Superset from Ambari

Step 1: Prerequisite — Deploy Trino First

Superset connects to ODP data through SQL engines. We recommend deploying Trino on Kubernetes first, as it provides the broadest query capability against Iceberg and Hive tables. Superset can also connect directly to Hive and Impala using their JDBC/ODBC interfaces.

Step 2: Open the Kubernetes View and Select Superset

In Ambari, navigate to Views > Kubernetes Manager. Click Deploy next to Apache Superset.

Step 3: Configure the Deployment

General tab:

Setting	Description	Default
Helm Release Name	Helm release name	`superset`
Namespace	Kubernetes namespace	`odp-apps`
Web Replicas	Number of Superset web pods	`1`
Worker Replicas	Number of Celery worker pods	`1`
Web CPU Request	CPU request per web pod	`0.5`
Web CPU Limit	CPU limit per web pod	`2`
Web Memory Request	Memory per web pod	`2Gi`
Web Memory Limit	Memory per web pod	`4Gi`
Secret Key	Flask secret key (auto-generated if blank)	auto

Database tab:

Setting	Description
Use Embedded PostgreSQL	Deploy PostgreSQL as a subchart (recommended for evaluation)
External PostgreSQL Host	Host of an external PostgreSQL instance
External PostgreSQL Port	Port (default: 5432)
External PostgreSQL Database	Database name (e.g., `superset`)
External PostgreSQL User	Database user
External PostgreSQL Password	Database password (stored encrypted)

For production, use an external PostgreSQL instance with proper backup and HA configuration rather than the embedded subchart.

Authentication tab (OIDC):

Setting	Description
Authentication Method	`DATABASE` (local users) or `OIDC`
OIDC Provider URL	e.g., `https://keycloak.example.com/realms/myrealm`
Client ID	OIDC client ID registered for Superset
Client Secret	OIDC client secret (stored encrypted)
Allowed Roles	Roles/groups that grant access to Superset
Admin Roles	Roles/groups that grant Superset admin

Ingress tab:

Setting	Description
Enable Ingress	Expose Superset via Kubernetes Ingress
Hostname	e.g., `superset.example.com`
TLS Secret	Name of the Kubernetes TLS Secret for HTTPS
Ingress Class	Ingress controller class (e.g., `nginx`)

Step 4: Submit

Click Deploy. Monitor the deployment in Background Operations. Superset initialization (including database migrations) typically takes 3–8 minutes on first install.

Connecting Superset to ODP Data Sources

After deployment, configure database connections in Superset so users can explore ODP data.

Connecting to Trino (Recommended)

Trino provides the broadest SQL coverage for ODP data, including Iceberg tables, and is the recommended connection for Superset.

In Superset, navigate to Settings > Database Connections > + Database.

Select Trino as the database type and enter the SQLAlchemy connection string:

trino://<user>@trino-coordinator.odp-apps.svc.cluster.local:8080/hive

Since both Superset and Trino run in the same Kubernetes namespace, they can communicate via Kubernetes internal DNS without exposing Trino externally.

Connection parameters:

Parameter	Value
SQLAlchemy URI	`trino://superset_svc@trino-coordinator.odp-apps.svc.cluster.local:8080/hive`
Display Name	`ODP - Trino (Iceberg/Hive)`
Expose in SQL Lab	Enabled
Allow DML	Disabled (recommended — Superset should be read-only)

With Kerberos-secured Trino, the Superset service account needs a Kerberos principal. Ambari provisions this keytab and configures the Superset-to-Trino connection to use it automatically when OIDC is not the sole authentication method.

Connecting to Hive via HiveServer2

For direct Hive connectivity (for Hive-native tables not covered by Trino):

hive://<hiveserver2-host>:10000/default

With Kerberos:

# In Superset config (injected by Ambari Helm values)
SQLALCHEMY_CUSTOM_PASSWORD_STORE = ...
# Connection string with Kerberos params
hive://hiveserver2.example.com:10000/default?auth=KERBEROS&kerberos_service_name=hive

Connecting to Impala

impala://<impala-host>:21050/default

With Kerberos:

impala://impala.example.com:21050/default?auth_mechanism=GSSAPI&kerberos_service_name=impala

Authentication and User Management

Local Database Authentication

By default (when OIDC is not configured), Superset uses its own internal user database. Users are created in Settings > List Users.

Roles in Superset:

Admin: full platform access, can manage connections and users
Alpha: can create and edit charts and dashboards
Gamma: read-only access to granted dashboards
sql_lab: access to SQL Lab for ad-hoc queries
Public: anonymous access (if enabled — not recommended for secured environments)

OIDC Authentication

When Ambari deploys Superset with OIDC configured, it injects the following into the Superset superset_config.py:

from flask_appbuilder.security.manager import AUTH_OAUTH

AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
    {
        "name": "oidc",
        "icon": "fa-openid",
        "token_key": "access_token",
        "remote_app": {
            "client_id": "<client-id>",
            "client_secret": "<client-secret>",
            "api_base_url": "<provider-url>",
            "server_metadata_url": "<provider-url>/.well-known/openid-configuration",
            "client_kwargs": {"scope": "openid email profile groups"},
        },
    }
]
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = "Gamma"

Group-to-role mapping is also configurable, so that LDAP/AD groups are automatically mapped to Superset roles at login time.

Synchronizing with LDAP

Superset can be configured to authenticate against LDAP directly (without OIDC), using the same LDAP server configured in ODP. When Ambari deploys Superset, it can inject LDAP parameters from the ODP cluster configuration (read from the LDAP configuration stored in Ambari).

Creating Dashboards on ODP Data

SQL Lab for Exploration

SQL Lab is Superset's interactive SQL editor. Use it to explore ODP data before building charts:

Navigate to SQL Lab > SQL Editor.
Select the ODP - Trino database and the target schema.
Write and run SQL. Results appear inline.
Save a query as a Virtual Dataset to use as the basis for a chart.

Example: exploring Iceberg table history via Trino:

SELECT
  committed_at,
  snapshot_id,
  operation,
  summary['added-records'] AS added_records,
  summary['deleted-records'] AS deleted_records
FROM hive.iceberg_demo."my_table$snapshots"
ORDER BY committed_at DESC
LIMIT 20;

Building Charts

Navigate to Charts > + Chart.
Select a dataset (physical table or virtual dataset from SQL Lab).
Choose a chart type (Bar, Line, Table, Map, etc.).
Configure dimensions, metrics, and filters.
Save the chart.

Assembling Dashboards

Navigate to Dashboards > + Dashboard.
Drag charts from the chart picker onto the layout grid.
Configure filters that apply across all charts on the dashboard.
Set the dashboard refresh interval if you want auto-refresh for near-real-time data.
Publish the dashboard and share it with the appropriate Superset roles.

Resource Sizing Recommendations

The following sizing recommendations apply to typical ODP deployments. Adjust based on the number of concurrent users and dashboard complexity.

Scenario	Web Pods	Worker Pods	Web Memory	Worker Memory
Development / Evaluation	1	1	2Gi	1Gi
Small team (< 20 users)	1	2	4Gi	2Gi
Medium team (20–100 users)	2	2	4Gi	2Gi
Large team (100+ users)	3+	3+	8Gi	4Gi

PostgreSQL: allocate at least 5 GB of persistent storage for the Superset metadata database. For teams with many saved charts and dashboards, 20–50 GB is more appropriate.

Redis: the default subchart configuration (256 MB memory limit) is sufficient for most deployments. Increase if you use Superset's alert and report features heavily.

Horizontal Scaling

Superset web pods are stateless (session state in Redis). Adding web replicas requires a Kubernetes Ingress or LoadBalancer with sticky sessions, or Redis-backed session storage (configured automatically by Ambari's Helm values).

Monitoring Superset from Ambari

The Kubernetes View shows the following for the Superset deployment:

Pod status: web, worker, beat, Redis, and PostgreSQL pod states
Helm release status: current revision and Flux reconciliation status
Recent events: Kubernetes events for the Superset resources

For Superset-level monitoring, use the built-in Superset > Logs section to see query execution history and errors.

Upgrading Superset

To upgrade to a newer Superset version:

In the Kubernetes View, select the Superset deployment.
Click Upgrade.
Review the configuration diff.
Click Confirm Upgrade.

Ambari runs database migrations automatically as part of the upgrade (the Helm chart includes an init container for migrations). Existing dashboards and charts are preserved.

If the upgrade fails, use Rollback in the Kubernetes View to return to the previous Helm revision.

Deploying Apache Superset on Kubernetes via Ambari

Overview​

Superset Helm Chart Managed by Ambari​

Deploying Superset from Ambari​

Step 1: Prerequisite — Deploy Trino First​

Step 2: Open the Kubernetes View and Select Superset​

Step 3: Configure the Deployment​

Step 4: Submit​

Connecting Superset to ODP Data Sources​

Connecting to Trino (Recommended)​

Connecting to Hive via HiveServer2​

Connecting to Impala​

Authentication and User Management​

Local Database Authentication​

OIDC Authentication​

Synchronizing with LDAP​

Creating Dashboards on ODP Data​

SQL Lab for Exploration​

Building Charts​

Assembling Dashboards​

Resource Sizing Recommendations​

Monitoring Superset from Ambari​

Upgrading Superset​