Conformal prediction after data-dependent model selection
Ruiting Liang, Wanrong Zhu, Rina Foygel Barber

TL;DR
This paper develops new conformal prediction methods that maintain validity after data-dependent model selection without extra data splitting, achieving near-optimal prediction set widths.
Contribution
The authors introduce efficient, finite-sample valid conformal prediction techniques that handle model selection bias without additional data splitting.
Findings
Methods provide finite-sample validity guarantees.
Prediction sets are asymptotically optimal in width.
Demonstrated effectiveness on synthetic and real datasets.
Abstract
Given a family of pretrained models and a hold-out set, how can we construct a valid conformal prediction set while selecting a model that minimizes the width of the set? If we use the same hold-out data set both to select a model (the model that yields the smallest conformal prediction sets) and then to construct a conformal prediction set based on that selected model, we suffer a loss of coverage due to selection bias. Alternatively, we could further split the data to perform selection and calibration separately, but this comes at a steep cost if the size of the dataset is limited. In this paper, we address the challenge of constructing a valid prediction set after data-dependent model selection -- commonly, selecting the model that minimizes the width of the resulting prediction sets. Our novel methods can be implemented efficiently and admit finite-sample validity guarantees without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
