IGU-LoRA: Adaptive Rank Allocation via Integrated Gradients and Uncertainty-Aware Scoring
Xuan Cui, Huiyue Li, Run Zeng, Yunfei Zhao, Jinrui Qian, Wei Duan, Bo Liu, Zhanpeng Zhou

TL;DR
IGU-LoRA introduces an adaptive rank allocation method for parameter-efficient fine-tuning of large language models, leveraging integrated gradients and uncertainty measures to improve performance and stability across tasks.
Contribution
It proposes a novel IGU-LoRA method that computes within-layer sensitivities and uses uncertainty-aware schemes for more effective rank allocation in PEFT.
Findings
Outperforms strong PEFT baselines on multiple tasks
Improves downstream accuracy and robustness
Validates importance of pathwise sensitivity and uncertainty in rank selection
Abstract
As large language models (LLMs) scale to billions of parameters, full-parameter fine-tuning becomes compute- and memory-prohibitive. Parameter-efficient fine-tuning (PEFT) mitigates this issue by updating only a small set of task-specific parameters while keeping the base model frozen. Among PEFT approaches, low-rank adaptation (LoRA) is widely adopted; however, it enforces a uniform rank across layers despite substantial variation in layer importance, motivating {layerwise} rank allocation. Recent adaptive-rank variants (e.g., AdaLoRA) allocate ranks based on importance scores, yet typically rely on instantaneous gradients that capture only local sensitivity, overlooking non-local, pathwise effects within the same layer, which yields unstable and biased scores. To address this limitation, we introduce IGU-LoRA, an adaptive-rank LoRA that (i) computes within-layer Integrated Gradients…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper introduces a new importance-scoring mechanism that replaces instantaneous gradient magnitudes with parameter-space IG. 2. The experimental evaluation is comprehensive, covering multiple model scales (RoBERTa-large, Qwen-2.5-0.5B, Llama-2/3, DeepSeek) and diverse benchmark types.
1. In the example of Figure 2(b), any two parameter curves with the same integrated area would yield identical importance scores under Eq. (4). Consequently, the IG formulation cannot distinguish between parameters that are early-important and those that are late-important along the training. 2. Missing Recent Baselines. The paper omits several recent and closely related adaptive-rank methods, such as GoRA [1] (gradient-driven adaptive rank adjustment) and SalientLoRA [2] (saliency-based rank a
1. good theoretical contribution. The error bound for the stochastic IG approximation and the stability guarantee for the SNR-based score are welcome additions. 2. IGU-LoRA maintains comparable training/inference latency and memory usage to baselines while delivering better performance 3. novelty in importance score, addresses critical limitations like gradient saturation and unstable rank allocation, with clear theoretical justification for IG approximation error.
1. While experiments include Llama-3-8B, results for models larger than 10B parameters (e.g., Llama-3-70B) are absent. Given that PEFT is most critical for very large LLMs, this limits confidence in IGU-LoRA’s scalability. 2. The method requires O(N) gradient evaluations per parameter group during training, which is downplayed but still significant, especially for larger models. 3. What worries me the most is that due to the use of gradient accumulation, the training process of the model can be
The use of integrated gradients in the parameter space for rank allocation is a novel and impactful idea, addressing key limitations of gradient-based approaches. The paper provides a solid theoretical foundation for the proposed method, including error bounds for IG approximation and stability guarantees for the uncertainty-aware scoring. The method is evaluated on diverse benchmarks (e.g., GLUE, mathematical reasoning, and common-sense reasoning tasks) with multiple backbone models, demonstr
While IGU-LoRA achieves strong performance, the use of integrated gradients introduces additional computational costs compared to simpler methods like LoRA. This could limit its scalability to extremely large models or real-time applications. While training efficiency is discussed, the impact of IGU-LoRA on inference latency is less emphasized, which could be relevant for deployment scenarios.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Topic Modeling
