Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of   Large Language Models

Emre Onal; Klemens Fl\"oge; Emma Caldwell; Arsen Sheverdin; Vincent; Fortuin

arXiv:2405.03425·cs.CL·July 23, 2024

Gaussian Stochastic Weight Averaging for Bayesian Low-Rank Adaptation of Large Language Models

Emre Onal, Klemens Fl\"oge, Emma Caldwell, Arsen Sheverdin, Vincent, Fortuin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple, efficient method combining LoRA with Gaussian SWAG to improve Bayesian inference, calibration, and robustness of large language models, especially on small datasets and out-of-distribution tasks.

Contribution

It presents a novel combination of Low-Rank Adaptation with Gaussian SWAG for Bayesian inference in LLMs, enhancing calibration and robustness with minimal computational overhead.

Findings

01

Improves model calibration and generalization on NLP benchmarks.

02

Enhances robustness against distribution shifts.

03

Achieves competitive performance with more complex Bayesian methods.

Abstract

Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian Stochastic Weight Averaging (SWAG), facilitating approximate Bayesian inference in LLMs. Through extensive testing across several Natural Language Processing (NLP) benchmarks, we demonstrate that our straightforward and computationally efficient approach improves model generalization and calibration competitively with comparable, more sophisticated methods for Bayesian inference in LLMs. We further show that our method exhibits greater robustness against distribution shift, as reflected in its improved performance on out-of-distribution tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fortuinlab/swag-lora
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling

MethodsStochastic Weight Averaging