Kron-LoRA: Hybrid Kronecker-LoRA Adapters for Scalable, Sustainable Fine-tuning

Yixin Shen

arXiv:2508.01961·cs.LG·September 25, 2025

Kron-LoRA: Hybrid Kronecker-LoRA Adapters for Scalable, Sustainable Fine-tuning

Yixin Shen

PDF

3 Reviews

TL;DR

Kron-LoRA is a hybrid adapter method combining Kronecker and low-rank techniques, enabling scalable, parameter-efficient fine-tuning of large language models with comparable or better performance than existing methods.

Contribution

It introduces Kron-LoRA, a novel hybrid adapter that reduces parameters significantly while maintaining expressivity, advancing scalable fine-tuning of large models.

Findings

01

Achieves up to 4x fewer parameters than standard LoRA.

02

Matches or exceeds LoRA performance across multiple benchmarks.

03

Maintains competitive transferability with only a quarter of the parameters.

Abstract

Fine-tuning massive pre-trained language models across many tasks demands adapters that are both parameter-efficient and expressive. We introduce \textbf{Kron-LoRA}, a hybrid adapter that combines Kronecker-structured factorization with low-rank LoRA compression-an integration that, to our knowledge, has not been explored in parameter-efficient fine-tuning or in matrix approximation literature. Kron-LoRA achieves up to 4 $\times$ fewer parameters than standard LoRA while retaining similar expressivity. Experiments on DistilBERT, Mistral-7B, LLaMA-2-7B, and LLaMA-3-8B across eight benchmarks show that Kron-LoRA matches or exceeds LoRA baselines with modest memory savings and only a 5-8\% speed overhead. In sequential fine-tuning, it also delivers competitive cross-task transfer despite using only one-quarter of the adapter parameters. Kron-LoRA thus offers a scalable, sustainable solution…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

The proposed idea is simple. The paper is mostly clearly written and easy to understand. In the experiments on the NLU tasks, the proposed method performs comparably to LoRA while employing close to $4\times$ fewer parameters. The authors show that KronA alone uses significantly fewer parameters but also significantly underperforms LoRA.

Weaknesses

1. **Insufficient baselines:** The empirical analysis severely lacks important baselines and comparison with SOTA PEFT approaches and LoRA variants. The proposed method is a simple combination of KronA and LoRA and thus must include comparison with these approaches with a similar number of trainable parameters. However, comparison KronA with is provided on a single backbone model and with a setting that uses just 10% of parameters (2.3M for KronA vs 21.3M for LoRA). There are no comparisons exce

Reviewer 02Rating 2Confidence 5

Strengths

- Sound methodology - Very well written paper

Weaknesses

- Weak experimental results: more recent datasets, more analysis and ablations.

Reviewer 03Rating 2Confidence 3

Strengths

1. First integration of Kronecker structure and LoRA. The combination of Kronecker factorization with LoRA is original and well-motivated. Prior work explored Kronecker adapters (KronA, AdaKron, MoKA) or LoRA variants separately, but not their hybridization. 2. Strong parameter-efficiency gains. The method achieves up to 4× parameter reduction while maintaining similar or better accuracy compared to LoRA-8 across all backbones (Table 2–3). The analysis clearly shows that Kronecker structure pro

Weaknesses

1. Conceptual simplicity vs. depth. While novel, the combination of Kronecker and LoRA is a natural hybrid rather than a deeply theoretical contribution. The paper’s strength lies in practicality and empirical rigor, not fundamental mathematical insight. 2. Limited ablation on Kronecker dimensions. The choice of d seems heuristic. It’s unclear how sensitive results are to this partition, or whether learned Kronecker shapes could further improve results. 3. Minor computational overhead. Kron-Lo

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.