Frequently Asked Questions
What is ODP?
What is ODP?
ODP (Open Data Platform) is an open-source Big Data solution that offers a set of tools to store, analyze, and visualize big data. It aims to simplify data management using big data technologies based on the Apache Hadoop ecosystem.
What is CLEMLAB?
What is CLEMLAB?
CLEMLAB is a company created to distribute and promote ODP. It allows users to easily access scalable and 100% open-source big data technologies.
What are the main steps to install ODP - Open Source Big Data distribution based on Apache Hadoop components?
What are the main steps to install ODP - Open Source Big Data distribution based on Apache Hadoop components?
- 1. Download ODP from the official repositories
- 2. Configure the required settings (system, storage, network).
- 3. Install Apache Ambari on the nodes where you want to install ODP.
- 4. Install the necessary components, Hadoop, Spark, Ranger via Ambari's UI or REST API.
- 5. Start the services and check the status of the components.
What is the release cycle of ODP?
What is the release cycle of ODP?
ODP follows a semi-annual release cycle. Each new version brings performance improvements, bug fixes, and new features based on community feedback.
Is ODP Open Source?
Is ODP Open Source?
Yes, ODP is 100% open source. Its code is freely available and can be used, modified, and redistributed under the terms of the Apache 2.0 license.
How to contribute?
How to contribute?
To contribute to the Open Source Big Data distribution, you can:
- Submit issues or proposals via GitHub.
- Share feedback or ideas on community forums.
- Submit issues or proposals via GitHub.
- Share feedback or ideas on community forums.
- Create pull requests to improve the code or documentation.
Where to find the source code?
Where to find the source code?
The source code of ODP components is available on GitHub. You can access it at the following address: https://github.com/clemlabprojects/hive-odp-release.
What Apache Big Data components does ODP provides ?
What Apache Big Data components does ODP provides ?
ODP provides Big Data components based on the Apache Hadoop ecosystem such as:
- Hadoop for distributed storage
- Spark for fast data processing.
- Kafka for real-time streaming.
- Hive for SQL queries on large datasets.
- Nifi for data flow automation.
- Flink for stream processing.
- Ambari for cluster management.
- Atlas for data governance.
- Ozone for scalable object storage.
Does ODP support open datalake house?
Does ODP support open datalake house?
Yes, ODP supports creating an open datalake house by combining Hadoop's data storage capabilities with Spark's data processing capabilities and Hive's data management capabilities. This allows for efficient and flexible management and analysis of large amounts of data. ODP 1.2.40 is compiled with Iceberg in Spark, Kafka, and Flink. ODP 1.2 comes with Apache Hive 3.1.3, but Hive is not yet compatible with Iceberg.