CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection
Lin Zhu, Yifeng Yang, Qinying Gu, Xinbing Wang, Chenghu Zhou, Nanyang, Ye

TL;DR
This paper introduces CRoFT, a fine-tuning framework for vision-language models that enhances out-of-distribution generalization and open-set detection by optimizing gradient behaviors, supported by theoretical insights and extensive experiments.
Contribution
We propose a novel objective for fine-tuning VL-PTMs that improves OOD generalization and open-set detection through concurrent optimization based on gradient magnitude minimization.
Findings
Our method outperforms existing approaches in OOD detection accuracy.
Theoretical analysis links gradient magnitude minimization to domain-consistent Hessians.
Extensive experiments validate the effectiveness of CRoFT across benchmarks.
Abstract
Recent vision-language pre-trained models (VL-PTMs) have shown remarkable success in open-vocabulary tasks. However, downstream use cases often involve further fine-tuning of VL-PTMs, which may distort their general knowledge and impair their ability to handle distribution shifts. In real-world scenarios, machine learning systems inevitably encounter both covariate shifts (e.g., changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of enhancing out-of-distribution (OOD) generalization on covariate shifts and simultaneously detecting semantic-shifted unseen classes. Thus a critical but underexplored question arises: How to improve VL-PTMs' generalization ability to closed-set OOD data, while effectively detecting open-set unseen classes during fine-tuning? In this paper, we propose a novel objective function of OOD detection that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Digital Filter Design and Implementation
