Skip to main content
Version: 1.3.1.0

Deploying Apache Superset on Kubernetes via Ambari

Tech Preview — ODP 1.3.2.0

This feature will be included in ODP 1.3.2.0 as a Tech Preview, currently in qualification. It is available for early enterprise testing.

Interested in early access? Contact our team to join the enterprise early access program.

Overview

Apache Superset is an open-source business intelligence platform that provides interactive data exploration, visualization, and dashboarding. In ODP's Kubernetes integration, Ambari deploys Superset 4.1.4 on Kubernetes using the official Superset Helm chart, wired to your ODP data services (Trino, Hive, Impala) and secured with OIDC authentication.

Superset on Kubernetes complements the rest of the ODP stack: your data lives in HDFS or Ozone, governed by Ranger and catalogued by Atlas, queryable via Trino (on Kubernetes) or Hive and Impala (on the cluster). Superset provides the visualization layer on top, with no data duplication.

Superset Helm Chart Managed by Ambari

Ambari deploys Superset via the Superset Helm chart. The deployment includes:

  • Superset Web: the main application server (1 or more replicas)
  • Superset Worker: Celery worker for async chart rendering and alerts (1 or more replicas)
  • Superset Beat: Celery scheduler for periodic tasks
  • Redis: in-cluster cache and message broker for Celery (deployed as a subchart)
  • PostgreSQL: Superset's internal metadata database — stores dashboards, charts, users, and connections (deployed as a subchart, or you can point to an external PostgreSQL)

All of these are created and managed by Ambari through the Helm chart lifecycle.

Deploying Superset from Ambari

Step 1: Prerequisite — Deploy Trino First

Superset connects to ODP data through SQL engines. We recommend deploying Trino on Kubernetes first, as it provides the broadest query capability against Iceberg and Hive tables. Superset can also connect directly to Hive and Impala using their JDBC/ODBC interfaces.

Step 2: Open the Kubernetes View and Select Superset

In Ambari, navigate to Views > Kubernetes Manager. Click Deploy next to Apache Superset.

Step 3: Configure the Deployment

General tab:

SettingDescriptionDefault
Helm Release NameHelm release namesuperset
NamespaceKubernetes namespaceodp-apps
Web ReplicasNumber of Superset web pods1
Worker ReplicasNumber of Celery worker pods1
Web CPU RequestCPU request per web pod0.5
Web CPU LimitCPU limit per web pod2
Web Memory RequestMemory per web pod2Gi
Web Memory LimitMemory per web pod4Gi
Secret KeyFlask secret key (auto-generated if blank)auto

Database tab:

SettingDescription
Use Embedded PostgreSQLDeploy PostgreSQL as a subchart (recommended for evaluation)
External PostgreSQL HostHost of an external PostgreSQL instance
External PostgreSQL PortPort (default: 5432)
External PostgreSQL DatabaseDatabase name (e.g., superset)
External PostgreSQL UserDatabase user
External PostgreSQL PasswordDatabase password (stored encrypted)

For production, use an external PostgreSQL instance with proper backup and HA configuration rather than the embedded subchart.

Authentication tab (OIDC):

SettingDescription
Authentication MethodDATABASE (local users) or OIDC
OIDC Provider URLe.g., https://keycloak.example.com/realms/myrealm
Client IDOIDC client ID registered for Superset
Client SecretOIDC client secret (stored encrypted)
Allowed RolesRoles/groups that grant access to Superset
Admin RolesRoles/groups that grant Superset admin

Ingress tab:

SettingDescription
Enable IngressExpose Superset via Kubernetes Ingress
Hostnamee.g., superset.example.com
TLS SecretName of the Kubernetes TLS Secret for HTTPS
Ingress ClassIngress controller class (e.g., nginx)

Step 4: Submit

Click Deploy. Monitor the deployment in Background Operations. Superset initialization (including database migrations) typically takes 3–8 minutes on first install.

Connecting Superset to ODP Data Sources

After deployment, configure database connections in Superset so users can explore ODP data.

Trino provides the broadest SQL coverage for ODP data, including Iceberg tables, and is the recommended connection for Superset.

In Superset, navigate to Settings > Database Connections > + Database.

Select Trino as the database type and enter the SQLAlchemy connection string:

trino://<user>@trino-coordinator.odp-apps.svc.cluster.local:8080/hive

Since both Superset and Trino run in the same Kubernetes namespace, they can communicate via Kubernetes internal DNS without exposing Trino externally.

Connection parameters:

ParameterValue
SQLAlchemy URItrino://superset_svc@trino-coordinator.odp-apps.svc.cluster.local:8080/hive
Display NameODP - Trino (Iceberg/Hive)
Expose in SQL LabEnabled
Allow DMLDisabled (recommended — Superset should be read-only)

With Kerberos-secured Trino, the Superset service account needs a Kerberos principal. Ambari provisions this keytab and configures the Superset-to-Trino connection to use it automatically when OIDC is not the sole authentication method.

Connecting to Hive via HiveServer2

For direct Hive connectivity (for Hive-native tables not covered by Trino):

hive://<hiveserver2-host>:10000/default

With Kerberos:

# In Superset config (injected by Ambari Helm values)
SQLALCHEMY_CUSTOM_PASSWORD_STORE = ...
# Connection string with Kerberos params
hive://hiveserver2.example.com:10000/default?auth=KERBEROS&kerberos_service_name=hive

Connecting to Impala

impala://<impala-host>:21050/default

With Kerberos:

impala://impala.example.com:21050/default?auth_mechanism=GSSAPI&kerberos_service_name=impala

Authentication and User Management

Local Database Authentication

By default (when OIDC is not configured), Superset uses its own internal user database. Users are created in Settings > List Users.

Roles in Superset:

  • Admin: full platform access, can manage connections and users
  • Alpha: can create and edit charts and dashboards
  • Gamma: read-only access to granted dashboards
  • sql_lab: access to SQL Lab for ad-hoc queries
  • Public: anonymous access (if enabled — not recommended for secured environments)

OIDC Authentication

When Ambari deploys Superset with OIDC configured, it injects the following into the Superset superset_config.py:

from flask_appbuilder.security.manager import AUTH_OAUTH

AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{
"name": "oidc",
"icon": "fa-openid",
"token_key": "access_token",
"remote_app": {
"client_id": "<client-id>",
"client_secret": "<client-secret>",
"api_base_url": "<provider-url>",
"server_metadata_url": "<provider-url>/.well-known/openid-configuration",
"client_kwargs": {"scope": "openid email profile groups"},
},
}
]
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = "Gamma"

Group-to-role mapping is also configurable, so that LDAP/AD groups are automatically mapped to Superset roles at login time.

Synchronizing with LDAP

Superset can be configured to authenticate against LDAP directly (without OIDC), using the same LDAP server configured in ODP. When Ambari deploys Superset, it can inject LDAP parameters from the ODP cluster configuration (read from the LDAP configuration stored in Ambari).

Creating Dashboards on ODP Data

SQL Lab for Exploration

SQL Lab is Superset's interactive SQL editor. Use it to explore ODP data before building charts:

  1. Navigate to SQL Lab > SQL Editor.
  2. Select the ODP - Trino database and the target schema.
  3. Write and run SQL. Results appear inline.
  4. Save a query as a Virtual Dataset to use as the basis for a chart.

Example: exploring Iceberg table history via Trino:

SELECT
committed_at,
snapshot_id,
operation,
summary['added-records'] AS added_records,
summary['deleted-records'] AS deleted_records
FROM hive.iceberg_demo."my_table$snapshots"
ORDER BY committed_at DESC
LIMIT 20;

Building Charts

  1. Navigate to Charts > + Chart.
  2. Select a dataset (physical table or virtual dataset from SQL Lab).
  3. Choose a chart type (Bar, Line, Table, Map, etc.).
  4. Configure dimensions, metrics, and filters.
  5. Save the chart.

Assembling Dashboards

  1. Navigate to Dashboards > + Dashboard.
  2. Drag charts from the chart picker onto the layout grid.
  3. Configure filters that apply across all charts on the dashboard.
  4. Set the dashboard refresh interval if you want auto-refresh for near-real-time data.
  5. Publish the dashboard and share it with the appropriate Superset roles.

Resource Sizing Recommendations

The following sizing recommendations apply to typical ODP deployments. Adjust based on the number of concurrent users and dashboard complexity.

ScenarioWeb PodsWorker PodsWeb MemoryWorker Memory
Development / Evaluation112Gi1Gi
Small team (< 20 users)124Gi2Gi
Medium team (20–100 users)224Gi2Gi
Large team (100+ users)3+3+8Gi4Gi

PostgreSQL: allocate at least 5 GB of persistent storage for the Superset metadata database. For teams with many saved charts and dashboards, 20–50 GB is more appropriate.

Redis: the default subchart configuration (256 MB memory limit) is sufficient for most deployments. Increase if you use Superset's alert and report features heavily.

Horizontal Scaling

Superset web pods are stateless (session state in Redis). Adding web replicas requires a Kubernetes Ingress or LoadBalancer with sticky sessions, or Redis-backed session storage (configured automatically by Ambari's Helm values).

Monitoring Superset from Ambari

The Kubernetes View shows the following for the Superset deployment:

  • Pod status: web, worker, beat, Redis, and PostgreSQL pod states
  • Helm release status: current revision and Flux reconciliation status
  • Recent events: Kubernetes events for the Superset resources

For Superset-level monitoring, use the built-in Superset > Logs section to see query execution history and errors.

Upgrading Superset

To upgrade to a newer Superset version:

  1. In the Kubernetes View, select the Superset deployment.
  2. Click Upgrade.
  3. Review the configuration diff.
  4. Click Confirm Upgrade.

Ambari runs database migrations automatically as part of the upgrade (the Helm chart includes an init container for migrations). Existing dashboards and charts are preserved.

If the upgrade fails, use Rollback in the Kubernetes View to return to the previous Helm revision.