Bayesian data selection

Eli N. Weinstein; Jeffrey W. Miller

arXiv:2109.02712·stat.ME·September 10, 2021·J. Mach. Learn. Res.

Bayesian data selection

Eli N. Weinstein, Jeffrey W. Miller

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Stein volume criterion, a Bayesian method for data and model selection that efficiently identifies relevant data features without complex nonparametric modeling, validated through simulations and biological data analysis.

Contribution

It proposes the Stein volume criterion, a novel Bayesian score for data and model selection that avoids fitting complex background models, with proven consistency and asymptotic properties.

Findings

01

The Stein volume criterion is computationally straightforward and effective.

02

It is consistent for data and model selection tasks.

03

Validated on single-cell RNA sequencing datasets.

Abstract

Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic - such as a subset of variables - that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing both data selection and model selection, the "Stein volume criterion", that takes the form of a generalized marginal likelihood…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EWeinstein/data-selection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Machine Learning and Algorithms