TL;DR
Kron-LoRA is a hybrid adapter method combining Kronecker and low-rank techniques, enabling scalable, parameter-efficient fine-tuning of large language models with comparable or better performance than existing methods.
Contribution
It introduces Kron-LoRA, a novel hybrid adapter that reduces parameters significantly while maintaining expressivity, advancing scalable fine-tuning of large models.
Findings
Achieves up to 4x fewer parameters than standard LoRA.
Matches or exceeds LoRA performance across multiple benchmarks.
Maintains competitive transferability with only a quarter of the parameters.
Abstract
Fine-tuning massive pre-trained language models across many tasks demands adapters that are both parameter-efficient and expressive. We introduce \textbf{Kron-LoRA}, a hybrid adapter that combines Kronecker-structured factorization with low-rank LoRA compression-an integration that, to our knowledge, has not been explored in parameter-efficient fine-tuning or in matrix approximation literature. Kron-LoRA achieves up to 4 fewer parameters than standard LoRA while retaining similar expressivity. Experiments on DistilBERT, Mistral-7B, LLaMA-2-7B, and LLaMA-3-8B across eight benchmarks show that Kron-LoRA matches or exceeds LoRA baselines with modest memory savings and only a 5-8\% speed overhead. In sequential fine-tuning, it also delivers competitive cross-task transfer despite using only one-quarter of the adapter parameters. Kron-LoRA thus offers a scalable, sustainable solution…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The proposed idea is simple. The paper is mostly clearly written and easy to understand. In the experiments on the NLU tasks, the proposed method performs comparably to LoRA while employing close to $4\times$ fewer parameters. The authors show that KronA alone uses significantly fewer parameters but also significantly underperforms LoRA.
1. **Insufficient baselines:** The empirical analysis severely lacks important baselines and comparison with SOTA PEFT approaches and LoRA variants. The proposed method is a simple combination of KronA and LoRA and thus must include comparison with these approaches with a similar number of trainable parameters. However, comparison KronA with is provided on a single backbone model and with a setting that uses just 10% of parameters (2.3M for KronA vs 21.3M for LoRA). There are no comparisons exce
- Sound methodology - Very well written paper
- Weak experimental results: more recent datasets, more analysis and ablations.
1. First integration of Kronecker structure and LoRA. The combination of Kronecker factorization with LoRA is original and well-motivated. Prior work explored Kronecker adapters (KronA, AdaKron, MoKA) or LoRA variants separately, but not their hybridization. 2. Strong parameter-efficiency gains. The method achieves up to 4× parameter reduction while maintaining similar or better accuracy compared to LoRA-8 across all backbones (Table 2–3). The analysis clearly shows that Kronecker structure pro
1. Conceptual simplicity vs. depth. While novel, the combination of Kronecker and LoRA is a natural hybrid rather than a deeply theoretical contribution. The paper’s strength lies in practicality and empirical rigor, not fundamental mathematical insight. 2. Limited ablation on Kronecker dimensions. The choice of d seems heuristic. It’s unclear how sensitive results are to this partition, or whether learned Kronecker shapes could further improve results. 3. Minor computational overhead. Kron-Lo
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
