TL;DR
This paper explores the use of conformal prediction with large-scale vision-language models like CLIP, proposing a new transfer learning method called Conf-OT that improves uncertainty quantification and efficiency across multiple datasets.
Contribution
It introduces Conf-OT, a novel transductive conformal prediction method using optimal transport to handle domain drift in foundation models, with theoretical guarantees and broad empirical validation.
Findings
Conf-OT improves set efficiency by up to 20%.
Conf-OT is 15 times faster than existing transductive methods.
The approach maintains coverage guarantees across diverse datasets.
Abstract
Vision-language models pre-trained at large scale have shown unprecedented adaptability and generalization to downstream tasks. Although its discriminative potential has been widely explored, its reliability and uncertainty are still overlooked. In this work, we investigate the capabilities of CLIP models under the split conformal prediction paradigm, which provides theoretical guarantees to black-box models based on a small, labeled calibration set. In contrast to the main body of literature on conformal predictors in vision classifiers, foundation models exhibit a particular characteristic: they are pre-trained on a one-time basis on an inaccessible source domain, different from the transferred task. This domain drift negatively affects the efficiency of the conformal sets and poses additional challenges. To alleviate this issue, we propose Conf-OT, a transfer learning setting that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training
