Gradient Projection For Continual Parameter-Efficient Tuning

Jingyang Qiao; Zhizhong Zhang; Xin Tan; Yanyun Qu and; Wensheng Zhang; Zhi Han; Yuan Xie

arXiv:2405.13383·cs.LG·July 18, 2024

Gradient Projection For Continual Parameter-Efficient Tuning

Jingyang Qiao, Zhizhong Zhang, Xin Tan, Yanyun Qu and, Wensheng Zhang, Zhi Han, Yuan Xie

PDF

Open Access

TL;DR

This paper introduces a unified gradient projection framework called PEGP that enhances parameter-efficient tuning methods by reducing forgetting in continual learning across various models and modalities.

Contribution

It reformulates PETs from a gradient projection perspective and proposes orthogonal gradient projection to effectively mitigate forgetting with minimal additional resources.

Findings

01

Significantly reduces forgetting in continual learning scenarios.

02

Effective across diverse models like ViT and CLIP.

03

Improves generalization in multi-modal and domain adaptation tasks.

Abstract

Parameter-efficient tunings (PETs) have demonstrated impressive performance and promising perspectives in training large models, while they are still confronted with a common problem: the trade-off between learning new content and protecting old knowledge, leading to zero-shot generalization collapse, and cross-modal hallucination. In this paper, we reformulate Adapter, LoRA, Prefix-tuning, and Prompt-tuning from the perspective of gradient projection, and firstly propose a unified framework called Parameter Efficient Gradient Projection (PEGP). We introduce orthogonal gradient projection into different PET paradigms and theoretically demonstrate that the orthogonal condition for the gradient can effectively resist forgetting even for large-scale models. It therefore modifies the gradient towards the direction that has less impact on the old feature space, with less extra memory space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Geophysical Methods and Applications · Machine Learning and ELM

MethodsContrastive Language-Image Pre-training · Adapter