Past, Present and Future of Hadoop: A Survey

Ameneh Zarei; Shahla Safari; Mahmood Ahmadi; Farhad Mardukhi

arXiv:2202.13293·cs.NI·March 1, 2022·6 cites

Past, Present and Future of Hadoop: A Survey

Ameneh Zarei, Shahla Safari, Mahmood Ahmadi, Farhad Mardukhi

PDF

Open Access

TL;DR

This survey reviews Hadoop, a framework for large-scale data storage and processing, highlighting its architecture, components, and capabilities in handling massive datasets efficiently on commodity hardware.

Contribution

It provides a comprehensive overview of Hadoop's architecture, components, and features, summarizing its evolution and current state in big data processing.

Findings

01

Hadoop enables scalable and fault-tolerant data processing.

02

HDFS efficiently manages large datasets on commodity hardware.

03

Hadoop's MapReduce simplifies parallel data processing.

Abstract

In this paper, a technology for massive data storage and computing named Hadoop is surveyed. Hadoop consists of heterogeneous computing devices like regular PCs abstracting away the details of parallel processing and developers can just concentrate on their computational problem. A Hadoop cluster is made of two parts: HDFs and Mapreduce. Hadoop cluster uses HDFS for data management. HDFS provides storage for input and output data in MapReduce jobs and is designed with abilities like high-fault tolerance, high-distribution capacity, and high throughput. It is also suitable for storing Terabyte data on clusters and it runs on flexible hardware like commodity devices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed and Parallel Computing Systems