TL;DR
iGSP introduces an efficient gradient subspace projection method for continual learning in vision-language models, reducing parameters and improving accuracy across tasks.
Contribution
The paper proposes a novel framework, iGSP, that leverages implicit gradient subspace projection for efficient, parameter-saving continual learning.
Findings
Achieves state-of-the-art accuracy on MTIL benchmark.
Reduces trainable parameters by 42.7% compared to SOTA.
Decreases total parameters by 86.9% relative to counterparts.
Abstract
Vision-Language Models require efficient adaptation to continually emerging downstream tasks. While Parameter-Efficient Fine-Tuning mitigates catastrophic forgetting, assigning isolated modules per task leads to parameter explosion. Conversely, recent similarity-driven sharing mechanisms falsely equate superficial visual similarity with underlying alignment consistency. This fundamental mismatch triggers severe negative transfer between visually similar but logically distinct tasks and fails to exploit alignment reuse across visually diverse ones. We argue thatalignment sharing is fundamentally a geometric problem of overlapping optimization trajectories within shared low-rank subspaces. Grounded in this insight, we propose iGSP, a novel framework that achieves efficient adaptation via implicit gradient subspace projection. Leveraging the early convergence of MoE routers to establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
