AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

Jiacheng Li; Jianchao Tan; Zhidong Yang; Feiye Huo; Yerui Sun; Yuchen Xie; Xunliang Cai

arXiv:2512.22455·cs.LG·January 6, 2026

AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

Jiacheng Li, Jianchao Tan, Zhidong Yang, Feiye Huo, Yerui Sun, Yuchen Xie, Xunliang Cai

PDF

Open Access

TL;DR

AFA-LoRA introduces an annealed activation function to LoRA, enabling non-linear expressivity during training while maintaining mergeability, thus bridging the performance gap with full-parameter training across various tasks.

Contribution

It proposes a novel activation annealing strategy that enhances LoRA's expressivity without losing its mergeability, improving adaptation performance.

Findings

01

Reduces the performance gap between LoRA and full-parameter training.

02

Effective across supervised fine-tuning, reinforcement learning, and decoding tasks.

03

Maintains mergeability while adding non-linear capabilities.

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method. However, its linear adaptation process limits its expressive power. This means there is a gap between the expressive power of linear training and non-linear training. To bridge this gap, we propose AFA-LoRA, a novel training strategy that brings non-linear expressivity to LoRA while maintaining its seamless mergeability. Our key innovation is an annealed activation function that transitions from a non-linear to a linear transformation during training, allowing the adapter to initially adopt stronger representational capabilities before converging to a mergeable linear form. We implement our method on supervised fine-tuning, reinforcement learning, and speculative decoding. The results show that AFA-LoRA reduces the performance gap between LoRA and full-parameter training. This work enables a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis