Split Conformal Prediction under Data Contamination

Jase Clarkson; Wenkai Xu; Mihai Cucuringu; Yvik Swan; Gesine Reinert

arXiv:2407.07700·stat.ML·December 1, 2025

Split Conformal Prediction under Data Contamination

Jase Clarkson, Wenkai Xu, Mihai Cucuringu, Yvik Swan, Gesine Reinert

PDF

Open Access 1 Repo

TL;DR

This paper investigates the robustness of split conformal prediction under data contamination, quantifies its impact on coverage and efficiency, and proposes an adjustment method for contamination robustness.

Contribution

It introduces a novel analysis of split conformal prediction's robustness to data contamination and proposes a new contamination-robust conformal prediction method.

Findings

01

Contamination affects coverage and efficiency of conformal prediction.

02

The proposed adjustment improves robustness in contaminated data scenarios.

03

Numerical experiments validate the effectiveness of the new method.

Abstract

Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study the robustness of split conformal prediction in a data contamination setting, where we assume a small fraction of the calibration scores are drawn from a different distribution than the bulk. We quantify the impact of the corrupted data on the coverage and efficiency of the constructed sets when evaluated on "clean" test points, and verify our results with numerical experiments. Moreover, we propose an adjustment in the classification setting which we call Contamination Robust Conformal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jase-clarkson/cp_under_data_contamination
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Neural Networks and Applications