# Designing and Implementing Data Warehouse for Agricultural Big Data

**Authors:** Vuong M. Ngo, Nhien-An Le-Khac, M-Tahar Kechadi

arXiv: 1905.12411 · 2019-05-30

## TL;DR

This paper presents the design and implementation of a comprehensive agricultural data warehouse that integrates multiple big data technologies to support resource-efficient decision making in precision agriculture.

## Contribution

It introduces a novel multi-technology data warehouse architecture tailored for large, complex agricultural datasets, enabling efficient data integration, analysis, and management.

## Key findings

- The data warehouse supports flexible schema and high performance.
- It effectively integrates diverse agricultural datasets.
- The system demonstrates scalable and reliable performance.

## Abstract

In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, unstructured, heterogeneous, non-standardized, and inconsistent. Hence, the agricultural data mining is considered as Big Data application in terms of volume, variety, velocity and veracity. It is a key foundation to establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse by combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) replication and recovery; (9) consistency, availability and partition tolerant; (10) distributed and cloud deployment. We also evaluate the performance of our data warehouse.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12411/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1905.12411/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1905.12411/full.md

---
Source: https://tomesphere.com/paper/1905.12411