Investigating Data Usage for Inductive Conformal Predictors

Yizirui Fang; Anthony Bellotti

arXiv:2406.12262·cs.LG·June 19, 2024

Investigating Data Usage for Inductive Conformal Predictors

Yizirui Fang, Anthony Bellotti

PDF

Open Access

TL;DR

This paper explores how to best divide data for inductive conformal predictors, examining the effects of overlapping training and calibration sets to optimize prediction validity with limited data.

Contribution

It provides experimental insights into data partitioning strategies for ICPs, including the impact of overlapping datasets, which was previously underexplored.

Findings

01

Overlapping training and calibration sets can improve data efficiency.

02

Optimal data division depends on dataset size and application context.

03

Recommendations for practitioners on data splitting for ICPs.

Abstract

Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibration and test. With access to limited or expensive development data, it is an open question regarding the most efficient way to divide the data. This study provides several experiments to explore this question and consider the case for allowing overlap of examples between training and calibration sets. Conclusions are drawn that will be of value to academics and practitioners planning to use ICPs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications