Conformal prediction after data-dependent model selection

Ruiting Liang; Wanrong Zhu; Rina Foygel Barber

arXiv:2408.07066·stat.ME·April 20, 2026

Conformal prediction after data-dependent model selection

Ruiting Liang, Wanrong Zhu, Rina Foygel Barber

PDF

TL;DR

This paper develops new conformal prediction methods that maintain validity after data-dependent model selection without extra data splitting, achieving near-optimal prediction set widths.

Contribution

The authors introduce efficient, finite-sample valid conformal prediction techniques that handle model selection bias without additional data splitting.

Findings

01

Methods provide finite-sample validity guarantees.

02

Prediction sets are asymptotically optimal in width.

03

Demonstrated effectiveness on synthetic and real datasets.

Abstract

Given a family of pretrained models and a hold-out set, how can we construct a valid conformal prediction set while selecting a model that minimizes the width of the set? If we use the same hold-out data set both to select a model (the model that yields the smallest conformal prediction sets) and then to construct a conformal prediction set based on that selected model, we suffer a loss of coverage due to selection bias. Alternatively, we could further split the data to perform selection and calibration separately, but this comes at a steep cost if the size of the dataset is limited. In this paper, we address the challenge of constructing a valid prediction set after data-dependent model selection -- commonly, selecting the model that minimizes the width of the resulting prediction sets. Our novel methods can be implemented efficiently and admit finite-sample validity guarantees without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.