Robust Fine-tuning of Zero-shot Models via Variance Reduction
Beier Zhu, Jiequan Cui, Hanwang Zhang

TL;DR
This paper introduces VRF, a novel fine-tuning method for zero-shot models that reduces prediction variance, improving robustness and accuracy on both in-distribution and out-of-distribution data without trade-offs.
Contribution
The paper proposes a sample-wise ensembling technique called Variance Reduction Fine-tuning (VRF) that enhances robustness of zero-shot models by reducing prediction variance during fine-tuning.
Findings
VRF improves OOD accuracy by 1.5-2.0 percentage points over ensemble baselines.
VRF maintains or increases ID accuracy while boosting robustness.
VRF achieves significant gains across multiple distribution shift benchmarks.
Abstract
When fine-tuning zero-shot models like CLIP, our desideratum is for the fine-tuned model to excel in both in-distribution (ID) and out-of-distribution (OOD). Recently, ensemble-based models (ESM) have been shown to offer significant robustness improvement, while preserving high ID accuracy. However, our study finds that ESMs do not solve the ID-OOD trade-offs: they achieve peak performance for ID and OOD accuracy at different mixing coefficients. When optimized for OOD accuracy, the ensemble model exhibits a noticeable decline in ID accuracy, and vice versa. In contrast, we propose a sample-wise ensembling technique that can simultaneously attain the best ID and OOD accuracy without the trade-offs. Specifically, we construct a Zero-Shot Failure (ZSF) set containing training samples incorrectly predicted by the zero-shot model. For each test sample, we calculate its distance to the ZSF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Medical Imaging Techniques and Applications · Nuclear Physics and Applications
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
