CeRA: Overcoming the Linear Ceiling of Low-Rank Adaptation via Capacity Expansion
Hung-Hsuan Chen

TL;DR
CeRA introduces a non-linear capacity expansion method for low-rank adaptation in parameter-efficient fine-tuning, significantly improving performance on complex reasoning tasks with fewer parameters.
Contribution
It proposes CeRA, a novel weight-level parallel adapter with SiLU gating and dropout, overcoming the linear capacity ceiling of traditional LoRA methods.
Findings
CeRA matches linear baselines on simple tasks but excels on complex reasoning datasets.
At rank 64, CeRA outperforms high-rank LoRA and state-of-the-art linear variants in exact match accuracy.
Spectral analysis shows CeRA activates lower-variance spectrum tail, preventing rank collapse.
Abstract
Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a ``linear ceiling'': increasing the rank yields diminishing returns in expressive capacity due to intrinsic linear constraints. We introduce CeRA (Capacity-enhanced Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and dropout to induce non-linear capacity expansion. We demonstrate a fundamental relationship between adapter expressivity and task complexity. In basic arithmetic (GSM8K), CeRA matches standard linear baselines, but on the complex MATH dataset, it demonstrates high parameter efficiency in downstream reasoning (Exact Match). CeRA at rank 64 (pass@1 16.36\%) outperforms both a high-rank LoRA at rank 512 (15.72\%) and the state-of-the-art linear variant, DoRA, at rank 64 (14.44\%), achieving higher exact-match accuracy with only 1/8 of the parameter budget.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
