Investigating Data Usage for Inductive Conformal Predictors
Yizirui Fang, Anthony Bellotti

TL;DR
This paper explores how to best divide data for inductive conformal predictors, examining the effects of overlapping training and calibration sets to optimize prediction validity with limited data.
Contribution
It provides experimental insights into data partitioning strategies for ICPs, including the impact of overlapping datasets, which was previously underexplored.
Findings
Overlapping training and calibration sets can improve data efficiency.
Optimal data division depends on dataset size and application context.
Recommendations for practitioners on data splitting for ICPs.
Abstract
Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibration and test. With access to limited or expensive development data, it is an open question regarding the most efficient way to divide the data. This study provides several experiments to explore this question and consider the case for allowing overlap of examples between training and calibration sets. Conclusions are drawn that will be of value to academics and practitioners planning to use ICPs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
