Question 1

What is ODP?

Accepted Answer

ODP (Open Data Platform) is an open-source Big Data solution that offers a set of tools to store, analyze, and visualize big data. It aims to simplify data management using big data technologies based on the Apache Hadoop ecosystem.

Question 2

What is CLEMLAB?

Accepted Answer

CLEMLAB is a company created to distribute and promote ODP. It allows users to easily access scalable and 100% open-source big data technologies.

Question 3

What are the main steps to install ODP - Open Source Big Data distribution based on Apache Hadoop components?

Accepted Answer

1. Download ODP from the official repositories
2. Configure the required settings (system, storage, network).
3. Install Apache Ambari on the nodes where you want to install ODP.
4. Install the necessary components, Hadoop, Spark, Ranger via Ambari's UI or REST API.
5. Start the services and check the status of the components.

Question 4

What is the release cycle of ODP?

Accepted Answer

ODP follows a semi-annual release cycle. Each new version brings performance improvements, bug fixes, and new features based on community feedback.

Question 5

Is ODP Open Source?

Accepted Answer

Yes, ODP is 100% open source. Its code is freely available and can be used, modified, and redistributed under the terms of the Apache 2.0 license.

Question 6

How to contribute?

Accepted Answer

Submit issues or proposals via GitHub.
Share feedback or ideas on community forums.
Submit issues or proposals via GitHub.
Share feedback or ideas on community forums.
Create pull requests to improve the code or documentation.

Question 7

Where to find the source code?

Accepted Answer

The source code of ODP components is available on GitHub. You can access it at the following address: https://github.com/clemlabprojects/hive-odp-release.

Question 8

What Apache Big Data components does ODP provides ?

Accepted Answer

Hadoop for distributed storage
Spark for fast data processing.
Kafka for real-time streaming.
Hive for SQL queries on large datasets.
Nifi for data flow automation.
Flink for stream processing.
Ambari for cluster management.
Atlas for data governance.
Ozone for scalable object storage.

Question 9

Does ODP support open datalake house?

Accepted Answer

Yes, ODP supports creating an open datalake house by combining Hadoop's data storage capabilities with Spark's data processing capabilities and Hive's data management capabilities. This allows for efficient and flexible management and analysis of large amounts of data. ODP 1.2.40 is compiled with Iceberg in Spark, Kafka, and Flink. ODP 1.2 comes with Apache Hive 3.1.3, but Hive is not yet compatible with Iceberg.