Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
Hossein Abdi, Mingfei Sun, Wei Pan

TL;DR
This paper introduces a Bayesian natural gradient fine-tuning method for CLIP models using Kalman filtering, improving convergence, generalization, and out-of-distribution robustness in vision-language tasks.
Contribution
It presents the first application of Kalman filtering to fine-tune CLIP models, combining second-order optimization with Bayesian inference for enhanced performance.
Findings
Achieves superior in-distribution accuracy
Improves out-of-distribution robustness
Demonstrates efficient and robust fine-tuning
Abstract
Vision-language pre-trained models, such as CLIP, have established new benchmarks in multimodal data mining. In such models, few-shot fine-tuning is a major challenge to achieve optimal performance on both in-distribution (ID) and out-of-distribution (OOD) datasets, especially when labeled data is scarce. Most existing fine-tuning approaches rely on first-order gradient-based optimizers, which typically suffer from slow convergence, sensitivity to step-size hyperparameters, and poor generalization in OOD settings. In contrast, second-order methods utilize local curvature information of the loss landscape to adjust the update step size. This is particularly beneficial for CLIP models, whose non-convex loss functions often contain sharp critical points. In such cases, natural gradient direction can offer more substantial and efficient per-iteration updates when fine-tuning with limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
