Version: 1.3.1.0

Deploying Trino on Kubernetes via Ambari

Tech Preview — ODP 1.3.2.0

This feature will be included in ODP 1.3.2.0 as a Tech Preview, currently in qualification. It is available for early enterprise testing.

Interested in early access? Contact our team to join the enterprise early access program.

Why Trino on Kubernetes

Trino is a distributed SQL query engine designed for interactive analytics at scale. While ODP includes Hive and Impala for SQL workloads on the cluster, Trino on Kubernetes addresses a distinct set of requirements:

Elastic scaling: Kubernetes makes it straightforward to scale Trino workers horizontally based on query load. You can run 2 workers during off-peak hours and 20 during peak analytics workloads, without provisioning dedicated cluster nodes.

Workload isolation: Trino runs in containers separate from the Hadoop cluster nodes, preventing heavy analytical queries from competing for resources with YARN jobs on the same machines.

Federation: Trino can query data from multiple sources simultaneously — Iceberg tables in HDFS, PostgreSQL databases, Kafka topics — in a single query. This federation capability makes it a natural hub for exploratory analytics across heterogeneous data.

Superset integration: Apache Superset, also deployable through Ambari, connects to Trino via JDBC. Running both Trino and Superset on Kubernetes and connecting them to ODP data creates a complete, container-native BI stack backed by governed Hadoop storage.

Trino Helm Chart Managed by Ambari

Ambari deploys Trino using the official Trino Helm chart, with values generated from the ODP cluster configuration. The chart creates:

Trino Coordinator: 1 pod (configurable) — receives queries, plans execution, manages workers
Trino Workers: N pods (configurable) — execute query fragments in parallel
ConfigMaps: config.properties, jvm.config, log.properties, node.properties for coordinator and workers
Secrets: Kerberos keytab, Ranger plugin configuration, TLS certificates (if configured)
Service: ClusterIP service for internal access; optionally a LoadBalancer or Ingress for external JDBC access

Deploying Trino from Ambari

Step 1: Open the Kubernetes View

In Ambari, navigate to Views > Kubernetes Manager (or the name you gave your view instance). The application catalog appears.

Step 2: Select Trino and Click Deploy

Click Deploy next to Trino. The configuration wizard opens.

Step 3: Configure the Deployment

General tab:

Setting	Description	Default
Helm Release Name	Name for the Helm release	`trino`
Namespace	Kubernetes namespace	`odp-apps`
Coordinator Replicas	Number of coordinator pods	`1`
Worker Replicas	Number of worker pods	`3`
Worker CPU Request	CPU request per worker	`2`
Worker CPU Limit	CPU limit per worker	`4`
Worker Memory Request	Memory request per worker	`8Gi`
Worker Memory Limit	Memory limit per worker	`16Gi`
Coordinator Memory	Memory for coordinator pod	`4Gi`

Connectivity tab (pre-populated from ODP cluster):

Setting	Source	Example
Hive Metastore URI	Hive config in Ambari	`thrift://master02.example.com:9083,thrift://master03.example.com:9083`
HDFS Default FS	HDFS config in Ambari	`hdfs://mycluster`
Ranger REST URL	Ranger config in Ambari	`https://master01.example.com:6182`

Security tab:

Setting	Description
Kerberos Principal	Service principal for Trino (e.g., `trino/k8s-worker.example.com@REALM`)
Keytab	Generated by Ambari from the cluster KDC
Kerberos Realm	Pre-populated from cluster Kerberos config
KDC Address	Pre-populated from cluster Kerberos config

Step 4: Submit

Click Deploy. Ambari creates a background operation. Monitor progress in Background Operations. A typical Trino deployment takes 2–5 minutes.

Trino Configuration Details

Coordinator `config.properties`

Ambari generates the following coordinator configuration (abbreviated):

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://trino-coordinator:8080

Worker `config.properties`

coordinator=false
http-server.http.port=8080
discovery.uri=http://trino-coordinator:8080

Connecting Trino to Iceberg via Hive Metastore

Ambari configures the Hive catalog for Trino with Iceberg support:

# /etc/trino/catalog/hive.properties (inside container)
connector.name=iceberg
hive.metastore.uri=thrift://master02.example.com:9083,thrift://master03.example.com:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/_HOST@REALM.EXAMPLE.COM
hive.metastore.client.principal=trino/k8s-worker.example.com@REALM.EXAMPLE.COM
hive.metastore.client.keytab=/etc/trino/keytabs/trino.keytab
hive.config.resources=/etc/trino/conf/core-site.xml,/etc/trino/conf/hdfs-site.xml
iceberg.file-format=PARQUET

The core-site.xml and hdfs-site.xml files are generated by Ambari from the current HDFS configuration and mounted as ConfigMaps.

Kerberos Authentication for Trino

Trino uses the service keytab provisioned by Ambari to authenticate to:

Hive Metastore: to resolve table locations and schema
HDFS: to read data files directly (for tables stored in HDFS/Ozone)
Ranger REST API: to fetch authorization policies

The keytab is stored as a Kubernetes Secret:

apiVersion: v1
kind: Secret
metadata:
  name: trino-kerberos-keytab
  namespace: odp-apps
type: Opaque
data:
  trino.keytab: <base64-encoded keytab>

And mounted into all Trino pods at /etc/trino/keytabs/trino.keytab.

Ranger Authorization for Trino

Ambari configures the Trino Ranger plugin at deployment time. The plugin intercepts every Trino query and evaluates it against Ranger policies before execution.

How It Works

A user submits a query via JDBC or Trino CLI.
The Trino Coordinator forwards the authorization request to the Ranger plugin.
The plugin queries the Ranger REST API with the user identity, resource (catalog/schema/table/column), and action (SELECT, INSERT, etc.).
Ranger evaluates the applicable policies (resource-based and tag-based).
If access is denied, Trino returns an authorization error. If permitted, the query proceeds.

Ranger Service Definition for Trino

In Ranger, Trino appears as a service of type Trino (or Presto). Create a Trino service in Ranger pointing to the Trino coordinator:

Service Name: trino_k8s
Trino URL: http://trino-coordinator.odp-apps.svc.cluster.local:8080
Username: ranger_trino_lookup
Password: <password>

Policies are then created in this service to control access to Trino catalogs, schemas, and tables.

Relationship with Hive Ranger Policies

Users who can access a Hive table through Ranger do not automatically get access to the same table through Trino. Trino and Hive are separate Ranger services. You need to grant access in both services, or use Ranger tag-based policies to grant access via Atlas tags, which apply across services.

Tag-Based Policies for Consistency

To avoid maintaining duplicate policies for Hive and Trino, use Atlas to tag your sensitive tables (e.g., PII, RESTRICTED) and create tag-based policies in Ranger that apply to all services. This ensures consistent access control regardless of which engine a user queries through.

Accessing Trino

JDBC

The Trino JDBC driver connects to the coordinator service. The connection URL format:

jdbc:trino://<trino-coordinator-host>:<port>/<catalog>/<schema>

If Trino is exposed via a Kubernetes LoadBalancer or NodePort:

jdbc:trino://trino.example.com:8080/hive/default

With Kerberos authentication (from a client with a valid Kerberos ticket):

jdbc:trino://trino.example.com:8080/hive/default?KerberosRemoteServiceName=trino&KerberosPrincipal=user@REALM.EXAMPLE.COM&KerberosConfigPath=/etc/krb5.conf&KerberosKeytabPath=/home/user/user.keytab&SSL=true&SSLTrustStorePath=/etc/ssl/certs/truststore.jks

Trino CLI

trino \
  --server https://trino.example.com:8080 \
  --krb5-remote-service-name trino \
  --krb5-principal user@REALM.EXAMPLE.COM \
  --krb5-config-path /etc/krb5.conf \
  --catalog hive \
  --schema default

Verifying Access to ODP Data

Once connected, verify that Iceberg tables on ODP are accessible:

-- List available schemas
SHOW SCHEMAS FROM hive;

-- Query an Iceberg table
SELECT * FROM hive.iceberg_demo.my_table LIMIT 10;

-- Check Trino's view of Iceberg table history
SELECT * FROM hive.iceberg_demo."my_table$snapshots";

Monitoring Trino from Ambari

The Kubernetes View displays the following for the Trino deployment:

Pod status: coordinator and worker pod states (Running, Pending, CrashLoopBackOff)
Replica count: actual vs. desired worker count
Helm release status: current revision and Flux reconciliation status
Recent events: Kubernetes events for the Trino namespace

For deeper Trino monitoring, access the Trino Web UI at http://<trino-coordinator>:8080. It shows:

Active and queued queries
Worker node status and resource usage
Query execution plans and stage details

Scaling Workers

To adjust the number of Trino workers after initial deployment:

In the Kubernetes View, select the Trino deployment.
Click Configure.
Update the Worker Replicas field.
Click Upgrade. Ambari performs a helm upgrade with the new value.

Kubernetes scales the worker Deployment to the new replica count. Existing queries continue running on current workers; new workers become available for new queries within a minute or two.

Deploying Trino on Kubernetes via Ambari

Why Trino on Kubernetes​

Trino Helm Chart Managed by Ambari​

Deploying Trino from Ambari​

Step 1: Open the Kubernetes View​

Step 2: Select Trino and Click Deploy​

Step 3: Configure the Deployment​

Step 4: Submit​

Trino Configuration Details​

Coordinator config.properties​

Worker config.properties​

Connecting Trino to Iceberg via Hive Metastore​

Kerberos Authentication for Trino​

Ranger Authorization for Trino​

How It Works​

Ranger Service Definition for Trino​

Relationship with Hive Ranger Policies​

Accessing Trino​

JDBC​

Trino CLI​

Verifying Access to ODP Data​

Monitoring Trino from Ambari​

Scaling Workers​