Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models
Emre Onal, Klemens Fl\"oge, Emma Caldwell, Arsen Sheverdin, Vincent, Fortuin

TL;DR
This paper introduces a simple, efficient method combining LoRA with Gaussian SWAG to improve Bayesian inference, calibration, and robustness of large language models, especially on small datasets and out-of-distribution tasks.
Contribution
It presents a novel combination of Low-Rank Adaptation with Gaussian SWAG for Bayesian inference in LLMs, enhancing calibration and robustness with minimal computational overhead.
Findings
Improves model calibration and generalization on NLP benchmarks.
Enhances robustness against distribution shifts.
Achieves competitive performance with more complex Bayesian methods.
Abstract
Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration competitively with comparable, more sophisticated methods for Bayesian inference in LLMs. We further show that our method exhibits greater robustness against distribution shift, as reflected in its improved performance on out-of-distribution tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling
MethodsStochastic Weight Averaging
