Learning When to Adapt

Ali Zindari; Xiaowen Jiang; Rotem Mulayoff; Sebastian U. Stich

arXiv:2605.19028·cs.LG·May 20, 2026

Learning When to Adapt

Ali Zindari, Xiaowen Jiang, Rotem Mulayoff, Sebastian U. Stich

PDF

1 Repo

TL;DR

DISeL enhances LoRA by adding input-dependent gates, reducing catastrophic forgetting and providing interpretability during fine-tuning of large models.

Contribution

It introduces DISeL, a lightweight, input-sensitive extension to LoRA that preserves pre-trained behavior while improving task-specific adaptation.

Findings

01

DISeL reduces forgetting compared to LoRA on multiple tasks.

02

DISeL maintains competitive fine-tuning accuracy.

03

Gate activations offer interpretability of adaptation focus.

Abstract

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method, yet its learned correction is static: the same low-rank update is applied to every input. This input-agnostic approach creates an inevitable compromise between adapting to the fine-tuning distribution and preserving pre-trained behavior on inputs outside that distribution, contributing to catastrophic forgetting. We introduce DISeL (Dynamic Input-Sensitive LoRA), which augments LoRA modules with lightweight input-dependent gates over individual rank-one components. The gating mechanism is designed to preserve the pre-trained model's behavior by default, while training learns to activate selected components that reduce the fine-tuning loss. DISeL adds only a small number of parameters and preserves the low-rank structure. Across RoBERTa on GLUE, and Llama and Mistral models fine-tuned for mathematical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alizindari/DISeL
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.