Robust and Parallel Bayesian Model Selection
Michael Minyi Zhang, Henry Lam, Lizhen Lin

TL;DR
This paper introduces a parallel divide-and-conquer Bayesian model selection method that enhances computational efficiency and robustness against outliers by aggregating subset inferences using geometric median, suitable for large and contaminated data sets.
Contribution
It proposes a novel parallel Bayesian model selection framework that improves accuracy and robustness through aggregation with geometric median, addressing computational and contamination challenges.
Findings
Improved concentration in identifying the correct model.
Enhanced robustness to outliers and data contamination.
Scalable to large data sets with parallel processing.
Abstract
Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another challenge one may encounter is the presence of outliers and contaminations that damage the inference quality. The parallel "divide and conquer" model selection strategy divides the observations of the full data set into roughly equal subsets and perform inference and model selection independently on each subset. After local subset inference, this method aggregates the posterior model probabilities or other model/variable selection criteria to obtain a final model by using the notion of geometric median. This approach leads to improved concentration in finding the "correct" model and model parameters and also is provably robust to outliers and data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
