A Survey on Geographically Distributed Big-Data Processing using   MapReduce

Shlomi Dolev; Patricia Florissi; Ehud Gudes; Shantanu Sharma; Ido; Singer

arXiv:1707.01869·cs.DB·July 7, 2017

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Shlomi Dolev, Patricia Florissi, Ehud Gudes, Shantanu Sharma, Ido, Singer

PDF

TL;DR

This survey reviews the challenges and advancements in geographically distributed big-data processing frameworks, focusing on MapReduce, Spark, and SQL-style systems, highlighting their limitations and future directions.

Contribution

It provides a comprehensive classification and analysis of geo-distributed big-data processing frameworks, discussing their challenges, requirements, and overhead issues.

Findings

01

Identifies key challenges in geo-distributed data processing.

02

Classifies existing frameworks into batch, stream, and SQL-style systems.

03

Highlights the need for new architectures to process data locally without moving raw datasets.

Abstract

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.