Improving LoRA with Variational Learning
Bai Cong, Nico Daheim, Yuesong Shen, Rio Yokota, Mohammad Emtiyaz Khan, Thomas M\"ollenhoff

TL;DR
This paper introduces IVON, a variational algorithm that enhances LoRA finetuning for large language models by improving metrics like accuracy and calibration with minimal additional computational cost.
Contribution
The paper demonstrates that IVON is a simple, efficient variational method that significantly improves LoRA finetuning performance on billion-scale LLMs, surpassing existing Bayesian approaches.
Findings
IVON improves accuracy by 1.3% on Llama-3.2-3B.
IVON reduces ECE by 5.4%.
IVON outperforms Laplace-LoRA and BLoB in experiments.
Abstract
Bayesian methods have recently been used to improve LoRA finetuning and, although they improve calibration, their effect on other metrics (such as accuracy) is marginal and can sometimes even be detrimental. Moreover, Bayesian methods also increase computational overheads and require additional tricks for them to work well. Here, we fix these issues by using a recently proposed variational algorithm called IVON. We show that IVON is easy to implement and has similar costs to AdamW, and yet it can also drastically improve many metrics by using a simple posterior pruning technique. We present extensive results on billion-scale LLMs (Llama and Qwen series) going way beyond the scale of existing applications of IVON. For example, we finetune a Llama-3.2-3B model on a set of commonsense reasoning tasks and improve accuracy over AdamW by 1.3% and reduce ECE by 5.4%, outperforming AdamW and…
Peer Reviews
Decision·Submitted to ICLR 2026
+ The paper is clearly written and is easy to follow. + Related works are discussed though being limited to the literature of LoRA for LLMs. + The contributions of the proposed method are clearly presented and discussed. + Despite of the problematic evaluation metric, the authors have conducted good amount of experiments to demonstrate the performance of their method.
+ The contributions are minor. See Section Summary for a detailed summary of the contributions of this paper. + Problematic overall evaluation metric. The authors used average accuracy/ECE on 6 datasets as an overall evaluation metric for these methods, e.g.Table 1 and 2. This is a huge mistake and could cause misleading results and conclusions. + Some typos: eg Line 158.
- The empirical Bayes formulation for determining the prior is the most novel and promising aspect of the work. This idea has potential applicability beyond the presented context and represents a fundamental improvement over IVON. - The experimental results provide convincing evidence that the proposed method performs on par with established Bayesian fine-tuning approaches such as BLoB and the Laplace approximation. - The proposed adaptation preserves the implementation simplicity of IVON.
- The paper does not include a comparison with an IVON baseline, making it difficult to assess the actual contribution of the proposed adaptations relative to the existing IVON method. - The choice of tempering with $\lambda$ during optimization is neither theoretically motivated nor empirically investigated. The experiments appear to either fix $\lambda = 5k = \gamma$ or disregard the $\gamma \sqrt{N}$ heuristic entirely (e.g., by setting $\lambda = 5 \cdot 10^6$). - The calibration analysis
- The paper is well-written and the methodological contributions are easy to follow. - The proposed method, IVON-LoRA, is simple to implement and adds minimal computational overhead. This makes the approach a practical and valuable contribution. - The empirical evaluation is extensive and robustly demonstrates the benefits of the proposed method through comparisons with: - The standard non-Bayesian Adam optimizer, showing that IVON-LoRA improves both accuracy and calibration. - Bayesian comp
The motivation for some of the algorithmic design choices could be further substantiated: - A more in-depth discussion is needed to motivate the adaptation of the prior precision *at each optimization step*. How does this procedure align with the Bayesian framework, where the prior is typically fixed to represent beliefs held before observing the data? Adapting the prior to the current state of the posterior seems to weaken its role as a fixed regularizer, as it no longer serves as a static anc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning
MethodsAdamW · Pruning · Sparse Evolutionary Training
