Past, Present and Future of Hadoop: A Survey
Ameneh Zarei, Shahla Safari, Mahmood Ahmadi, Farhad Mardukhi

TL;DR
This survey reviews Hadoop, a framework for large-scale data storage and processing, highlighting its architecture, components, and capabilities in handling massive datasets efficiently on commodity hardware.
Contribution
It provides a comprehensive overview of Hadoop's architecture, components, and features, summarizing its evolution and current state in big data processing.
Findings
Hadoop enables scalable and fault-tolerant data processing.
HDFS efficiently manages large datasets on commodity hardware.
Hadoop's MapReduce simplifies parallel data processing.
Abstract
In this paper, a technology for massive data storage and computing named Hadoop is surveyed. Hadoop consists of heterogeneous computing devices like regular PCs abstracting away the details of parallel processing and developers can just concentrate on their computational problem. A Hadoop cluster is made of two parts: HDFs and Mapreduce. Hadoop cluster uses HDFS for data management. HDFS provides storage for input and output data in MapReduce jobs and is designed with abilities like high-fault tolerance, high-distribution capacity, and high throughput. It is also suitable for storing Terabyte data on clusters and it runs on flexible hardware like commodity devices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed and Parallel Computing Systems
