ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
Raghav Singhal, Kaustubh Ponkshe, Rohit Vartak, Praneeth Vepakomma

TL;DR
ABBA-Adapters introduce a novel PEFT architecture that reparameterizes model updates as a Hadamard product of two learnable low-rank matrices, significantly enhancing expressivity and achieving state-of-the-art results on reasoning benchmarks.
Contribution
The paper proposes ABBA, a new PEFT method that fully decouples updates from pre-trained weights, enabling higher expressivity and improved performance over existing methods.
Findings
ABBA achieves state-of-the-art results on reasoning benchmarks.
ABBA outperforms existing PEFT methods across multiple models.
Matrix reconstruction experiments validate ABBA's higher expressivity.
Abstract
Large Language Models have demonstrated strong performance across a wide range of tasks, but adapting them efficiently to new domains remains a key challenge. Parameter-Efficient Fine-Tuning (PEFT) methods address this by introducing lightweight, trainable modules while keeping most pre-trained weights fixed. The prevailing approach, LoRA, models updates using a low-rank decomposition, but its expressivity is inherently constrained by the rank. Recent methods like HiRA aim to increase expressivity by incorporating a Hadamard product with the frozen weights, but still rely on the structure of the pre-trained model. We introduce ABBA, a new PEFT architecture that reparameterizes the update as a Hadamard product of two independently learnable low-rank matrices. In contrast to prior work, ABBA fully decouples the update from the pre-trained weights, enabling both components to be optimized…
Peer Reviews
Decision·ICLR 2026 Poster
1. The idea of composing two learnable low-rank modules via a Hadamard product is conceptually clean and represents a clear generalization of LoRA and HiRA. The method increases effective rank while maintaining strict parameter efficiency. 2. The paper introduces Khatri–Rao reformulation and a well-motivated rank-stability theorem, which explains how scaling should depend on $r_1,r_2$. 3. Results across four foundation models and multiple reasoning tasks show large and consistent gains. The auth
1. The paper proves a scaling law for stability but does not empirically show how optimization behaves under varying $r_1, r_2$ or initialization errors. Gradient norm or loss-landscape visualizations would strengthen claims about stable training. 2. It remains unclear whether the performance gain comes from the Hadamard structure itself or simply from doubling the number of learnable matrices. An ablation removing the Hadamard product (e.g., summation or concatenation) would clarify this. 3. Wh
+ The proposed method extends HiRA by replacing its fixed modulation using a learnable factor decouples the updated from $W_0$, which is a technical improvement. The Khatri-Rao factorization is a nice implementation detail. + The proposed method is shown to have higher expressivity against LORA via matrix-reconstruction experiments and strong accuracy on commonsense/arithmetic.
- The main concern is the novelty of this paper is incremental relative to HiRA/MoRA/ReLORA/KronA. The core architectural change is to learn both factors in the Hadamard product instead of tying one to $W_0$ (HiRA). While useful, this feels like a straightforward extension in the space of multiplicative/structured adapters already explored (HiRA, MoRA, KronA, ReLoRA), and the paper’s Related Work acknowledges much of this trajectory. The new factorization is an implementation convenience rather
1. The method a clear theoretical motivation. 2. The Khatri-Rao formulation makes the method computationally efficient, and hence practically feasible. 3. The method is easy to implement and can be easily integrated with existing PEFT methods. 4. The method is evaluated on commonsense reasoning, arithmetic reasoning and outperforms prior PEFT methods across model sizes. 5. The method is ablated well to study its properties (initialization strategies, scaling factors, layer placement, and chainin
1. The paper compares ABBA with total ranks of 16 and 32 with rank 32 LoRA (and variants). However, LoRA can have onptimization problems with larger ranks, and many times, using smaller ranks can lead to slightly better performance. Hence, comparison with LoRA should also be done by setting the LoRA rank to 16 for a more thorough comparison. 2. I find it hard to believe that full fine-tuning lags behind ABBA by such a large margin. While the authors try to justify this on lines 330-332, better h
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Model-Driven Software Engineering Techniques · Embedded Systems Design Techniques
