i2MapReduce: Incremental MapReduce for Mining Evolving Big Data
Yanfeng Zhang, Shimin Chen, Qiang Wang, Ge Yu

TL;DR
i2MapReduce is an incremental extension to MapReduce that efficiently updates data mining results by reusing previous computations, supporting iterative algorithms, and reducing I/O overhead, leading to significant performance gains.
Contribution
The paper introduces i2MapReduce, a novel incremental processing framework that enhances MapReduce with fine-grain updates, iterative computation support, and I/O optimizations.
Findings
Significant performance improvements over standard MapReduce.
Effective support for iterative data mining algorithms.
Reduced I/O overhead in incremental processing.
Abstract
As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. In this paper, we propose i2MapReduce, a novel incremental processing extension to MapReduce, the most widely used framework for mining big data. Compared with the state-of-the-art work on Incoop, i2MapReduce (i) performs key-value pair level incremental processing rather than task level re-computation, (ii) supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and (iii) incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. We evaluate i2MapReduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Stream Mining Techniques · Caching and Content Delivery
