GEM-Style Constraints for PEFT with Dual Gradient Projection in LoRA

Brian Tekmen; Jason Yin; Qianqian Tong

arXiv:2601.02500·cs.LG·January 7, 2026

GEM-Style Constraints for PEFT with Dual Gradient Projection in LoRA

Brian Tekmen, Jason Yin, Qianqian Tong

PDF

Open Access

TL;DR

This paper introduces I-GEM, a computationally efficient method that applies GEM-like constraints within the LoRA subspace, enabling stable continual learning for large language models with significantly reduced overhead.

Contribution

We propose I-GEM, a novel dual gradient projection method that constrains non-interference in the LoRA subspace, achieving GEM-like stability with much lower computational cost.

Findings

01

I-GEM matches GEM's accuracy within 0.04 points.

02

I-GEM outperforms A-GEM by approximately 1.4 points.

03

Projection time is reduced by a factor of about 1000.

Abstract

Full fine-tuning of Large Language Models (LLMs) is computationally costly, motivating Continual Learning (CL) approaches that utilize parameter-efficient adapters. We revisit Gradient Episodic Memory (GEM) within the Low-Rank Adapter (LoRA) subspace and introduce I-GEM: a fixed-budget, GPU-resident dual projected-gradient approximation to GEM's quadratic projection. By constraining non-interference solely within the adapter parameters, I-GEM preserves GEM-like stability with orders-of-magnitude lower mean projection overhead. On a 3-task AG News split with induced domain drift, using GPT-2 (355M) and LoRA ( $r = 8$ ), I-GEM matches GEM's average accuracy (within $\sim 0.04$ pts) and outperforms A-GEM by $\sim 1.4$ pts. Crucially, it reduces projection time vs.\ GEM by a factor of $\sim 1 0^{3}$ . These results suggest that applying GEM constraints in the LoRA subspace is a practical pathway…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications