Early Accurate Results for Advanced Analytics on MapReduce
Nikolay Laptev, Kai Zeng, Carlo Zaniolo

TL;DR
This paper introduces EARL, a non-parametric extension to Hadoop that enables incremental computation of early, approximate results with reliable accuracy estimates, significantly improving speed for large-scale data analysis.
Contribution
The paper presents EARL, a novel library that allows Hadoop to compute early results with accuracy estimates, addressing a gap in MapReduce systems for big data analytics.
Findings
EARL provides major speed-ups in typical Hadoop workflows.
The system offers reliable online accuracy estimates using bootstrapping.
EARL requires minimal modifications to existing Hadoop infrastructure.
Abstract
Approximate results based on samples often provide the only way in which advanced analytical applications on very massive data sets can satisfy their time and resource constraints. Unfortunately, methods and tools for the computation of accurate early results are currently not supported in MapReduce-oriented systems although these are intended for `big data'. Therefore, we proposed and implemented a non-parametric extension of Hadoop which allows the incremental computation of early results for arbitrary work-flows, along with reliable on-line estimates of the degree of accuracy achieved so far in the computation. These estimates are based on a technique called bootstrapping that has been widely employed in statistics and can be applied to arbitrary functions and data distributions. In this paper, we describe our Early Accurate Result Library (EARL) for Hadoop that was designed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Stream Mining Techniques · Scientific Computing and Data Management
