Bayesian data selection
Eli N. Weinstein, Jeffrey W. Miller

TL;DR
This paper introduces the Stein volume criterion, a Bayesian method for data and model selection that efficiently identifies relevant data features without complex nonparametric modeling, validated through simulations and biological data analysis.
Contribution
It proposes the Stein volume criterion, a novel Bayesian score for data and model selection that avoids fitting complex background models, with proven consistency and asymptotic properties.
Findings
The Stein volume criterion is computationally straightforward and effective.
It is consistent for data and model selection tasks.
Validated on single-cell RNA sequencing datasets.
Abstract
Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic - such as a subset of variables - that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing both data selection and model selection, the "Stein volume criterion", that takes the form of a generalized marginal likelihood…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Machine Learning and Algorithms
