Deploying Trino on Kubernetes via Ambari
This feature will be included in ODP 1.3.2.0 as a Tech Preview, currently in qualification. It is available for early enterprise testing.
Interested in early access? Contact our team to join the enterprise early access program.
Why Trino on Kubernetes
Trino is a distributed SQL query engine designed for interactive analytics at scale. While ODP includes Hive and Impala for SQL workloads on the cluster, Trino on Kubernetes addresses a distinct set of requirements:
Elastic scaling: Kubernetes makes it straightforward to scale Trino workers horizontally based on query load. You can run 2 workers during off-peak hours and 20 during peak analytics workloads, without provisioning dedicated cluster nodes.
Workload isolation: Trino runs in containers separate from the Hadoop cluster nodes, preventing heavy analytical queries from competing for resources with YARN jobs on the same machines.
Federation: Trino can query data from multiple sources simultaneously — Iceberg tables in HDFS, PostgreSQL databases, Kafka topics — in a single query. This federation capability makes it a natural hub for exploratory analytics across heterogeneous data.
Superset integration: Apache Superset, also deployable through Ambari, connects to Trino via JDBC. Running both Trino and Superset on Kubernetes and connecting them to ODP data creates a complete, container-native BI stack backed by governed Hadoop storage.
Trino Helm Chart Managed by Ambari
Ambari deploys Trino using the official Trino Helm chart, with values generated from the ODP cluster configuration. The chart creates:
- Trino Coordinator: 1 pod (configurable) — receives queries, plans execution, manages workers
- Trino Workers: N pods (configurable) — execute query fragments in parallel
- ConfigMaps:
config.properties,jvm.config,log.properties,node.propertiesfor coordinator and workers - Secrets: Kerberos keytab, Ranger plugin configuration, TLS certificates (if configured)
- Service: ClusterIP service for internal access; optionally a LoadBalancer or Ingress for external JDBC access
Deploying Trino from Ambari
Step 1: Open the Kubernetes View
In Ambari, navigate to Views > Kubernetes Manager (or the name you gave your view instance). The application catalog appears.
Step 2: Select Trino and Click Deploy
Click Deploy next to Trino. The configuration wizard opens.
Step 3: Configure the Deployment
General tab:
| Setting | Description | Default |
|---|---|---|
| Helm Release Name | Name for the Helm release | trino |
| Namespace | Kubernetes namespace | odp-apps |
| Coordinator Replicas | Number of coordinator pods | 1 |
| Worker Replicas | Number of worker pods | 3 |
| Worker CPU Request | CPU request per worker | 2 |
| Worker CPU Limit | CPU limit per worker | 4 |
| Worker Memory Request | Memory request per worker | 8Gi |
| Worker Memory Limit | Memory limit per worker | 16Gi |
| Coordinator Memory | Memory for coordinator pod | 4Gi |
Connectivity tab (pre-populated from ODP cluster):
| Setting | Source | Example |
|---|---|---|
| Hive Metastore URI | Hive config in Ambari | thrift://master02.example.com:9083,thrift://master03.example.com:9083 |
| HDFS Default FS | HDFS config in Ambari | hdfs://mycluster |
| Ranger REST URL | Ranger config in Ambari | https://master01.example.com:6182 |
Security tab:
| Setting | Description |
|---|---|
| Kerberos Principal | Service principal for Trino (e.g., trino/k8s-worker.example.com@REALM) |
| Keytab | Generated by Ambari from the cluster KDC |
| Kerberos Realm | Pre-populated from cluster Kerberos config |
| KDC Address | Pre-populated from cluster Kerberos config |
Step 4: Submit
Click Deploy. Ambari creates a background operation. Monitor progress in Background Operations. A typical Trino deployment takes 2–5 minutes.
Trino Configuration Details
Coordinator config.properties
Ambari generates the following coordinator configuration (abbreviated):
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://trino-coordinator:8080
Worker config.properties
coordinator=false
http-server.http.port=8080
discovery.uri=http://trino-coordinator:8080
Connecting Trino to Iceberg via Hive Metastore
Ambari configures the Hive catalog for Trino with Iceberg support:
# /etc/trino/catalog/hive.properties (inside container)
connector.name=iceberg
hive.metastore.uri=thrift://master02.example.com:9083,thrift://master03.example.com:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/_HOST@REALM.EXAMPLE.COM
hive.metastore.client.principal=trino/k8s-worker.example.com@REALM.EXAMPLE.COM
hive.metastore.client.keytab=/etc/trino/keytabs/trino.keytab
hive.config.resources=/etc/trino/conf/core-site.xml,/etc/trino/conf/hdfs-site.xml
iceberg.file-format=PARQUET
The core-site.xml and hdfs-site.xml files are generated by Ambari from the current HDFS configuration and mounted as ConfigMaps.
Kerberos Authentication for Trino
Trino uses the service keytab provisioned by Ambari to authenticate to:
- Hive Metastore: to resolve table locations and schema
- HDFS: to read data files directly (for tables stored in HDFS/Ozone)
- Ranger REST API: to fetch authorization policies
The keytab is stored as a Kubernetes Secret:
apiVersion: v1
kind: Secret
metadata:
name: trino-kerberos-keytab
namespace: odp-apps
type: Opaque
data:
trino.keytab: <base64-encoded keytab>
And mounted into all Trino pods at /etc/trino/keytabs/trino.keytab.
Ranger Authorization for Trino
Ambari configures the Trino Ranger plugin at deployment time. The plugin intercepts every Trino query and evaluates it against Ranger policies before execution.
How It Works
- A user submits a query via JDBC or Trino CLI.
- The Trino Coordinator forwards the authorization request to the Ranger plugin.
- The plugin queries the Ranger REST API with the user identity, resource (catalog/schema/table/column), and action (SELECT, INSERT, etc.).
- Ranger evaluates the applicable policies (resource-based and tag-based).
- If access is denied, Trino returns an authorization error. If permitted, the query proceeds.
Ranger Service Definition for Trino
In Ranger, Trino appears as a service of type Trino (or Presto). Create a Trino service in Ranger pointing to the Trino coordinator:
Service Name: trino_k8s
Trino URL: http://trino-coordinator.odp-apps.svc.cluster.local:8080
Username: ranger_trino_lookup
Password: <password>
Policies are then created in this service to control access to Trino catalogs, schemas, and tables.
Relationship with Hive Ranger Policies
Users who can access a Hive table through Ranger do not automatically get access to the same table through Trino. Trino and Hive are separate Ranger services. You need to grant access in both services, or use Ranger tag-based policies to grant access via Atlas tags, which apply across services.
To avoid maintaining duplicate policies for Hive and Trino, use Atlas to tag your sensitive tables (e.g., PII, RESTRICTED) and create tag-based policies in Ranger that apply to all services. This ensures consistent access control regardless of which engine a user queries through.
Accessing Trino
JDBC
The Trino JDBC driver connects to the coordinator service. The connection URL format:
jdbc:trino://<trino-coordinator-host>:<port>/<catalog>/<schema>
If Trino is exposed via a Kubernetes LoadBalancer or NodePort:
jdbc:trino://trino.example.com:8080/hive/default
With Kerberos authentication (from a client with a valid Kerberos ticket):
jdbc:trino://trino.example.com:8080/hive/default?KerberosRemoteServiceName=trino&KerberosPrincipal=user@REALM.EXAMPLE.COM&KerberosConfigPath=/etc/krb5.conf&KerberosKeytabPath=/home/user/user.keytab&SSL=true&SSLTrustStorePath=/etc/ssl/certs/truststore.jks
Trino CLI
trino \
--server https://trino.example.com:8080 \
--krb5-remote-service-name trino \
--krb5-principal user@REALM.EXAMPLE.COM \
--krb5-config-path /etc/krb5.conf \
--catalog hive \
--schema default
Verifying Access to ODP Data
Once connected, verify that Iceberg tables on ODP are accessible:
-- List available schemas
SHOW SCHEMAS FROM hive;
-- Query an Iceberg table
SELECT * FROM hive.iceberg_demo.my_table LIMIT 10;
-- Check Trino's view of Iceberg table history
SELECT * FROM hive.iceberg_demo."my_table$snapshots";
Monitoring Trino from Ambari
The Kubernetes View displays the following for the Trino deployment:
- Pod status: coordinator and worker pod states (Running, Pending, CrashLoopBackOff)
- Replica count: actual vs. desired worker count
- Helm release status: current revision and Flux reconciliation status
- Recent events: Kubernetes events for the Trino namespace
For deeper Trino monitoring, access the Trino Web UI at http://<trino-coordinator>:8080. It shows:
- Active and queued queries
- Worker node status and resource usage
- Query execution plans and stage details
Scaling Workers
To adjust the number of Trino workers after initial deployment:
- In the Kubernetes View, select the Trino deployment.
- Click Configure.
- Update the Worker Replicas field.
- Click Upgrade. Ambari performs a
helm upgradewith the new value.
Kubernetes scales the worker Deployment to the new replica count. Existing queries continue running on current workers; new workers become available for new queries within a minute or two.