Parallel-and-stream accelerator for computationally fast supervised   learning

Emily C. Hector; Lan Luo; Peter X.-K. Song

arXiv:2111.00032·stat.CO·November 2, 2021·Comput. Stat. Data Anal.

Parallel-and-stream accelerator for computationally fast supervised learning

Emily C. Hector, Lan Luo, Peter X.-K. Song

PDF

Open Access

TL;DR

This paper introduces PASA, a hybrid parallel-and-stream processing framework that combines MapReduce and online streaming to enhance the speed and efficiency of supervised learning on large datasets.

Contribution

The paper proposes a novel hybrid paradigm, PASA, integrating online streaming into MapReduce to improve computational speed and statistical efficiency in supervised learning.

Findings

01

PASA improves computational speed over traditional methods.

02

PASA maintains statistical efficiency in large-scale data.

03

Simulation and real data show PASA's practical advantages.

Abstract

Two dominant distributed computing strategies have emerged to overcome the computational bottleneck of supervised learning with big data: parallel data processing in the MapReduce paradigm and serial data processing in the online streaming paradigm. Despite the two strategies' common divide-and-combine approach, they differ in how they aggregate information, leading to different trade-offs between statistical and computational performance. In this paper, we propose a new hybrid paradigm, termed a Parallel-and-Stream Accelerator (PASA), that uses the strengths of both strategies for computationally fast and statistically efficient supervised learning. PASA's architecture nests online streaming processing into each distributed and parallelized data process in a MapReduce framework. PASA leverages the advantages and mitigates the disadvantages of both the MapReduce and online streaming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Advanced Bandit Algorithms Research