Skip to main content
Version: 1.3.1.0

Iceberg + Atlas integration

Apache Atlas integration for Iceberg is actively being integrated into the ODP distribution. This integration provides comprehensive metadata management, lineage tracking, and tag-based governance for Iceberg tables across the data platform.

Official Atlas support is planned for future release ODP update, with complete coverage for Apache Hive, Spark, and Impala.

What is being integrated

Governance, Security, Metadata & Lineage

  • Ranger: Iceberg tables are being mapped to the Hive service model so resource-based and tag‑based policies apply consistently.
  • Atlas integration: Apache Atlas integration is being implemented to provide comprehensive metadata management and lineage tracking for Iceberg tables. See Iceberg + Atlas integration for details.
  • Hive: Iceberg table and column entities, plus lineage through Hive queries.
  • Spark: Spark SQL lineage for Iceberg tables.
  • Impala: Iceberg table and column entities and lineage in the Impala hook.

Tag-based Governance

As a consequence of the Atlas integration, Tag policies wiring is also being implemented. This enables:

  • Classification propagation: Tags applied in Atlas (e.g., PII, Sensitive, Confidential) will be automatically propagated to Iceberg tables and columns.
  • Tag-based access control: Ranger tag-based policies will apply to Iceberg tables based on their Atlas classifications.
  • Dynamic policy enforcement: As metadata evolves in Atlas, access policies update automatically without manual Ranger policy changes.
  • Unified governance: Consistent tag-based security across Hive, Spark, and Impala when accessing Iceberg tables.

Example use case

  1. Create an Iceberg table in Hive or Spark
  2. Atlas automatically captures the table metadata
  3. Data steward applies a "PII" tag in Atlas UI to sensitive columns
  4. Ranger tag-based policy enforces masking or access restrictions
  5. Policy applies consistently across all engines (Hive, Spark, Impala)

Example of Iceberg to Atlas integration

The example below and the screenshots illustrate the ongoing integration work.

Spark (spark-shell, Scala)

Use spark-shell and configure an Iceberg catalog backed by the Hive Metastore, then run a simple create/insert/select workflow.

// spark-shell
// Configure Iceberg catalog backed by Hive Metastore
spark.conf.set("spark.sql.catalog.hive_catalog", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set("spark.sql.catalog.hive_catalog.type", "hive")
spark.conf.set("spark.sql.catalog.hive_catalog.uri", "thrift://master02.dev01.hadoop.clemlab.com:9083")
spark.conf.set("spark.sql.catalog.hive_catalog.warehouse", "hdfs://clemlabtest/warehouse/iceberg")

// Enable extra debugging for the Atlas hook during the integration work
spark.conf.set("atlas.hook.spark.iceberg.debug", "true")

spark.sql("CREATE DATABASE IF NOT EXISTS hive_catalog.iceberg_demo")

spark.sql(
"""
|CREATE TABLE hive_catalog.iceberg_demo.ice_table (
| id BIGINT,
| v STRING
|) USING iceberg
""".stripMargin)

spark.sql("INSERT INTO hive_catalog.iceberg_demo.ice_table VALUES (10,'x'),(20,'y')")
spark.sql("SELECT * FROM hive_catalog.iceberg_demo.ice_table").show()

Spark Atlas bridge hook (Atlas UI)

Hive bridge (Beeline)