GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation

Pritish Saha; Chandrav Rajbangshi; Rudra Goyal; Mohit Goyal; Anurag Deo; Biswajit Roy; Ningthoujam Dhanachandra Singh; Raxit Goswami; Amitava Das

arXiv:2601.00231·cs.LG·January 5, 2026

GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation

Pritish Saha, Chandrav Rajbangshi, Rudra Goyal, Mohit Goyal, Anurag Deo, Biswajit Roy, Ningthoujam Dhanachandra Singh, Raxit Goswami, Amitava Das

PDF

Open Access

TL;DR

GRIT is a geometry-aware, curvature-adaptive PEFT method that improves fine-tuning efficiency and effectiveness of large language models by leveraging Fisher eigendirections and dynamic rank adaptation, outperforming existing methods.

Contribution

Introduces GRIT, a novel PEFT approach that incorporates K-FAC preconditioning, Fisher-guided reprojecting, and dynamic rank adaptation to enhance model fine-tuning.

Findings

01

Reduces trainable parameters by 46% on average.

02

Matches or surpasses LoRA and QLoRA performance.

03

Lower drift and better update-retention trade-offs.

Abstract

Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subspaces with first-order descent, mostly ignoring local loss curvature. This can inflate the effective update budget and amplify drift along weakly constrained directions. We introduce GRIT, a dynamic, curvature-aware LoRA procedure that preserves the LoRA parameterization but: (1) preconditions gradients in rank space using K-FAC as a natural-gradient proxy; (2) periodically reprojects the low-rank basis onto dominant Fisher eigendirections to suppress drift; and (3) adapts the effective rank from the spectrum so capacity concentrates where signal resides. Across instruction-following, comprehension, and reasoning benchmarks on LLaMA backbones, GRIT matches or surpasses LoRA and QLoRA while reducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications