FIM-LoRA: Task-Informative Rank Allocation for LoRA via Calibration-Time Gradient-Variance Estimation

Ramakrishnan Sathyavageeswaran

arXiv:2605.16800·cs.LG·May 19, 2026

FIM-LoRA: Task-Informative Rank Allocation for LoRA via Calibration-Time Gradient-Variance Estimation

Ramakrishnan Sathyavageeswaran

PDF

TL;DR

FIM-LoRA introduces a calibration-time gradient-variance estimation method to allocate ranks adaptively across layers, improving LoRA's efficiency and interpretability without additional training costs.

Contribution

It proposes a lightweight, calibration-based approach to assign layer-specific ranks in LoRA, enhancing performance and interpretability without modifying the training process.

Findings

01

FIM-LoRA matches standard LoRA performance on GLUE and LLaMA tasks.

02

The method reduces memory cost by approximately 256x compared to full Fisher estimation.

03

Layer importance aligns with established transformer layer roles.

Abstract

Low-rank adaptation (LoRA) assigns a uniform rank to every adapted weight matrix - a practical convenience that ignores a fundamental reality: different layers contribute unequally to task adaptation. We address this with a lightweight engineering solution: before fine-tuning begins, run eight calibration backward passes, compute the gradient variance of each LoRA-B matrix as a proxy for layer informativeness, and redistribute the rank budget proportionally. The resulting adapter is a standard LoRA with a per-layer rank pattern - no new parameters, no training overhead, no changes to serving infrastructure. We implement this via an efficient approximation of the empirical Fisher Information Matrix (eFIM) diagonal, restricted to LoRA adapter matrices only, which reduces memory cost by approximately 256x compared to full-model Fisher estimation. On GLUE with DeBERTa-v3-base, FIM-LoRA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.