Asynchronous and Distributed Data Augmentation for Massive Data Settings
Jiayuan Zhou, Kshitij Khare, and Sanvesh Srivastava

TL;DR
This paper introduces an asynchronous, distributed data augmentation framework that significantly speeds up Bayesian inference in massive data settings by updating only a fraction of data subsets per iteration, maintaining accuracy and ergodicity.
Contribution
It develops a novel ADDA framework that extends traditional DA algorithms to distributed, asynchronous environments with theoretical guarantees and practical speed improvements.
Findings
ADDA achieves significant speed-up over traditional DA in massive data scenarios.
The Markov chain of ADDA is Harris ergodic, ensuring convergence to the correct distribution.
ADDA is proven to be geometrically ergodic, allowing for valid asymptotic inference.
Abstract
Data augmentation (DA) algorithms are widely used for Bayesian inference due to their simplicity. In massive data settings, however, DA algorithms are prohibitively slow because they pass through the full data in any iteration, imposing serious restrictions on their usage despite the advantages. Addressing this problem, we develop a framework for extending any DA that exploits asynchronous and distributed computing. The extended DA algorithm is indexed by a parameter and is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA starts by dividing the full data into smaller disjoint subsets and storing them on processes, which could be machines or processors. Every iteration of ADDA augments only an -fraction of the data subsets with some positive probability and leaves the remaining -fraction of the augmented data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
