Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models
Ziwei Liu, Borui Kang, Wei Li, Hangjie Yuan, Yanbing Yang, Wenbin Li, Yifan Zhu, Tao Feng, Jun Luo

TL;DR
This paper explores the use of Zeroth-Order optimization in Parameter-Efficient Fine-Tuning for Vision-Language Continual Learning, addressing optimization challenges and achieving state-of-the-art results.
Contribution
It introduces a modality-aware Zeroth-Order optimization strategy for PEFT-based VLCL, improving optimization stability and performance.
Findings
ZO optimization helps escape local minima in VLCL.
Modality-aware ZO improves training stability and accuracy.
State-of-the-art results on four benchmarks.
Abstract
Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
