FIM-LoRA: Task-Informative Rank Allocation for LoRA via Calibration-Time Gradient-Variance Estimation
Ramakrishnan Sathyavageeswaran

TL;DR
FIM-LoRA introduces a calibration-time gradient-variance estimation method to allocate ranks adaptively across layers, improving LoRA's efficiency and interpretability without additional training costs.
Contribution
It proposes a lightweight, calibration-based approach to assign layer-specific ranks in LoRA, enhancing performance and interpretability without modifying the training process.
Findings
FIM-LoRA matches standard LoRA performance on GLUE and LLaMA tasks.
The method reduces memory cost by approximately 256x compared to full Fisher estimation.
Layer importance aligns with established transformer layer roles.
Abstract
Low-rank adaptation (LoRA) assigns a uniform rank to every adapted weight matrix - a practical convenience that ignores a fundamental reality: different layers contribute unequally to task adaptation. We address this with a lightweight engineering solution: before fine-tuning begins, run eight calibration backward passes, compute the gradient variance of each LoRA-B matrix as a proxy for layer informativeness, and redistribute the rank budget proportionally. The resulting adapter is a standard LoRA with a per-layer rank pattern - no new parameters, no training overhead, no changes to serving infrastructure. We implement this via an efficient approximation of the empirical Fisher Information Matrix (eFIM) diagonal, restricted to LoRA adapter matrices only, which reduces memory cost by approximately 256x compared to full-model Fisher estimation. On GLUE with DeBERTa-v3-base, FIM-LoRA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
