Data fission: splitting a single data point
James Leiner, Boyan Duan, Larry Wasserman, Aaditya Ramdas

TL;DR
This paper introduces data fission, a new method inspired by Bayesian ideas, to split data into parts that together fully recover the original data, enabling tractable inference without sacrificing information.
Contribution
It proposes a general, finite-sample methodology for data splitting called data fission, extending previous approaches with a Bayesian-inspired continuous analog.
Findings
Data fission enables full data recovery from split parts in finite samples.
The method applies to post-selection inference in trend filtering and regression.
It offers a tractable alternative to traditional data splitting and p-value masking.
Abstract
Suppose we observe a random vector from some distribution in a known family with unknown parameters. We ask the following question: when is it possible to split into two parts and such that neither part is sufficient to reconstruct by itself, but both together can recover fully, and the joint distribution of is tractable? As one example, if and is a product distribution, then for any , we can split the sample to define and . Rasines and Young (2022) offers an alternative approach that uses additive Gaussian noise -- this enables post-selection inference in finite samples for Gaussian distributed data and asymptotically when errors are non-Gaussian. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Statistical Methods and Inference · Bayesian Modeling and Causal Inference
