Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Ziwei Liu; Borui Kang; Wei Li; Hangjie Yuan; Yanbing Yang; Wenbin Li; Yifan Zhu; Tao Feng; Jun Luo

arXiv:2506.12409·cs.CV·January 12, 2026

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Ziwei Liu, Borui Kang, Wei Li, Hangjie Yuan, Yanbing Yang, Wenbin Li, Yifan Zhu, Tao Feng, Jun Luo

PDF

Open Access 1 Video

TL;DR

This paper explores the use of Zeroth-Order optimization in Parameter-Efficient Fine-Tuning for Vision-Language Continual Learning, addressing optimization challenges and achieving state-of-the-art results.

Contribution

It introduces a modality-aware Zeroth-Order optimization strategy for PEFT-based VLCL, improving optimization stability and performance.

Findings

01

ZO optimization helps escape local minima in VLCL.

02

Modality-aware ZO improves training stability and accuracy.

03

State-of-the-art results on four benchmarks.

Abstract

Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning