SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models
Yuxuan Zhang

TL;DR
SECURA introduces a novel parameter-efficient fine-tuning method for large language models that reduces catastrophic forgetting and enhances performance through sigmoid-enhanced CUR decomposition and a new normalization technique.
Contribution
The paper proposes SECURA, a new PEFT approach combining Sigmoid-Enhanced CUR decomposition with a novel normalization to mitigate forgetting and improve fine-tuning in LLMs.
Findings
Achieves 3.59% average improvement on multiple-choice tasks.
Outperforms existing methods like DoRA and EWC in knowledge retention.
Maintains over 70% accuracy on basic knowledge tests.
Abstract
With the rapid development of large language models (LLMs), fully fine-tuning (FT) these models is becoming increasingly infeasible due to high computational demands. Moreover, FT also increases the risk of catastrophic forgetting. As an alternative, Low-Rank Adaptation (LoRA) has been proposed. By fine-tuning only a small subset of parameters, LoRA achieves performance similar to FT while significantly reducing resource requirements. However, since LoRA inherits FT's design, the issue of catastrophic forgetting still remains. To address these limitations, we propose SECURA: Sigmoid-Enhanced CUR Decomposition LoRA, a novel PEFT variant designed to mitigate catastrophic forgetting while improving fine-tuning performance. Our method introduces a novel normalization technique, Sigmoid-based Magnitude Norm (S-MagNorm), which enhances parameter retention and fine-tuning efficiency. SECURA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsExperience Replay · Elastic Weight Consolidation
