BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning

Lan Li; Tao Hu; Da-Wei Zhou; Han-Jia Ye; De-Chuan Zhan

arXiv:2511.11421·cs.CV·November 17, 2025

BOFA: Bridge-Layer Orthogonal Low-Rank Fusion for CLIP-Based Class-Incremental Learning

Lan Li, Tao Hu, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan

PDF

Open Access

TL;DR

BOFA introduces a parameter-efficient, orthogonal low-rank fusion framework for CLIP-based class-incremental learning, effectively preventing forgetting and enhancing multi-modal integration without additional inference costs.

Contribution

The paper proposes BOFA, a novel method that adapts CLIP solely through its bridge-layer using orthogonal low-rank fusion, avoiding extra parameters and improving CIL performance.

Findings

01

BOFA outperforms existing methods in accuracy on standard benchmarks.

02

It maintains model stability without data replay.

03

BOFA requires no additional inference cost.

Abstract

Class-Incremental Learning (CIL) aims to continually learn new categories without forgetting previously acquired knowledge. Vision-language models such as CLIP offer strong transferable representations via multi-modal supervision, making them promising for CIL. However, applying CLIP to CIL poses two major challenges: (1) adapting to downstream tasks often requires additional learnable modules, increasing model complexity and susceptibility to forgetting; and (2) while multi-modal representations offer complementary strengths, existing methods have yet to fully realize their potential in effectively integrating visual and textual modalities. To address these issues, we propose BOFA (Bridge-layer Orthogonal Fusion for Adaptation), a novel framework for CIL. BOFA confines all model adaptation exclusively to CLIP's existing cross-modal bridge-layer, thereby adding no extra parameters or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Face recognition and analysis