Parallel-and-stream accelerator for computationally fast supervised learning
Emily C. Hector, Lan Luo, Peter X.-K. Song

TL;DR
This paper introduces PASA, a hybrid parallel-and-stream processing framework that combines MapReduce and online streaming to enhance the speed and efficiency of supervised learning on large datasets.
Contribution
The paper proposes a novel hybrid paradigm, PASA, integrating online streaming into MapReduce to improve computational speed and statistical efficiency in supervised learning.
Findings
PASA improves computational speed over traditional methods.
PASA maintains statistical efficiency in large-scale data.
Simulation and real data show PASA's practical advantages.
Abstract
Two dominant distributed computing strategies have emerged to overcome the computational bottleneck of supervised learning with big data: parallel data processing in the MapReduce paradigm and serial data processing in the online streaming paradigm. Despite the two strategies' common divide-and-combine approach, they differ in how they aggregate information, leading to different trade-offs between statistical and computational performance. In this paper, we propose a new hybrid paradigm, termed a Parallel-and-Stream Accelerator (PASA), that uses the strengths of both strategies for computationally fast and statistically efficient supervised learning. PASA's architecture nests online streaming processing into each distributed and parallelized data process in a MapReduce framework. PASA leverages the advantages and mitigates the disadvantages of both the MapReduce and online streaming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Advanced Bandit Algorithms Research
