Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Kevin Kasa, Graham W. Taylor

TL;DR
This paper empirically evaluates the reliability of conformal prediction methods on modern vision models under challenging conditions like distribution shift and long-tailed data, revealing significant performance degradation.
Contribution
It provides the first large-scale empirical assessment of conformal prediction's robustness on contemporary vision architectures under real-world data challenges.
Findings
Performance degrades under distribution shift, violating safety guarantees.
Guarantees are often violated in long-tailed class distributions.
Performance issues are consistent across various conformal methods and neural networks.
Abstract
Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. Yet, its performance is known to degrade under distribution shift and long-tailed class distributions, which are often present in real world applications. Here, we characterize the performance of several post-hoc and training-based conformal prediction methods under these settings, providing the first empirical evaluation on large-scale datasets and models. We show that across numerous conformal methods and neural network families, performance greatly degrades under distribution shifts violating safety guarantees. Similarly, we show that in long-tailed settings the guarantees are frequently violated on many classes. Understanding the limitations of these methods is necessary for deployment in real world and safety-critical applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Fault Detection and Control Systems
