AccurateML: Information-aggregation-based Approximate Processing for Fast and Accurate Machine Learning on MapReduce
Rui Han, Fan Zhang, Zhentao Wang

TL;DR
AccurateML introduces an information-aggregation approach for fast, approximate machine learning on MapReduce, significantly reducing execution time while maintaining high accuracy compared to existing methods.
Contribution
It proposes a novel aggregation-based technique that improves approximate processing efficiency and accuracy in large-scale MapReduce machine learning tasks.
Findings
Reduces execution time by 30 times with minimal accuracy loss.
Achieves 2.71 times lower accuracy loss than existing methods at the same runtime.
Effectively identifies data parts most relevant to accuracy improvements.
Abstract
The growing demands of processing massive datasets have promoted irresistible trends of running machine learning applications on MapReduce. When processing large input data, it is often of greater values to produce fast and accurate enough approximate results than slow exact results. Existing techniques produce approximate results by processing parts of the input data, thus incurring large accuracy losses when using short job execution times, because all the skipped input data potentially contributes to result accuracy. We address this limitation by proposing AccurateML that aggregates information of input data in each map task to create small aggregated data points. These aggregated points enable all map tasks producing initial outputs quickly to save computation times and decrease the outputs' size to reduce communication times. Our approach further identifies the parts of input data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Data Management and Algorithms
