Soft-TransFormers for Continual Learning
Haeyong Kang, Chang D. Yoo

TL;DR
Soft-Transformers (Soft-TF) is a parameter-efficient continual learning framework that uses soft, real-valued subnetworks over frozen pre-trained Transformers, achieving state-of-the-art results by learning task-specific masks.
Contribution
Introduces Soft-TF, a novel method leveraging soft subnetworks and a dual-prompt mechanism for effective continual learning with minimal parameters.
Findings
Soft-TF outperforms prompt, adapter, and LoRA baselines on multiple benchmarks.
Soft-TF effectively mitigates catastrophic forgetting.
Soft-TF requires fewer additional parameters than existing methods.
Abstract
Inspired by the \emph{Well-initialized Lottery Ticket Hypothesis (WLTH)}, we introduce Soft-Transformer (Soft-TF), a parameter-efficient framework for continual learning that leverages soft, real-valued subnetworks over a frozen pre-trained Transformer. Instead of relying on manually designed prompts or adapters, Soft-TF learns task-specific multiplicative masks applied to the key, query, value, and output projections in self-attention. These masks enable smooth and stable task adaptation while preserving shared representations. Combined with a lightweight dual-prompt mechanism, Soft-TF maintains strong knowledge retention and mitigates Catastrophic Forgetting (CF). Across multiple continual learning benchmarks, Soft-TF achieves state-of-the-art performance, consistently outperforming prompt-based, adapter-based, and LoRA-style baselines while requiring minimal additional parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
