Soft-TransFormers for Continual Learning

Haeyong Kang; Chang D. Yoo

arXiv:2411.16073·cs.LG·April 29, 2026

Soft-TransFormers for Continual Learning

Haeyong Kang, Chang D. Yoo

PDF

TL;DR

Soft-Transformers (Soft-TF) is a parameter-efficient continual learning framework that uses soft, real-valued subnetworks over frozen pre-trained Transformers, achieving state-of-the-art results by learning task-specific masks.

Contribution

Introduces Soft-TF, a novel method leveraging soft subnetworks and a dual-prompt mechanism for effective continual learning with minimal parameters.

Findings

01

Soft-TF outperforms prompt, adapter, and LoRA baselines on multiple benchmarks.

02

Soft-TF effectively mitigates catastrophic forgetting.

03

Soft-TF requires fewer additional parameters than existing methods.

Abstract

Inspired by the \emph{Well-initialized Lottery Ticket Hypothesis (WLTH)}, we introduce Soft-Transformer (Soft-TF), a parameter-efficient framework for continual learning that leverages soft, real-valued subnetworks over a frozen pre-trained Transformer. Instead of relying on manually designed prompts or adapters, Soft-TF learns task-specific multiplicative masks applied to the key, query, value, and output projections in self-attention. These masks enable smooth and stable task adaptation while preserving shared representations. Combined with a lightweight dual-prompt mechanism, Soft-TF maintains strong knowledge retention and mitigates Catastrophic Forgetting (CF). Across multiple continual learning benchmarks, Soft-TF achieves state-of-the-art performance, consistently outperforming prompt-based, adapter-based, and LoRA-style baselines while requiring minimal additional parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.