Variational Low-Rank Adaptation Using IVON

Bai Cong; Nico Daheim; Yuesong Shen; Daniel Cremers; Rio Yokota,; Mohammad Emtiyaz Khan; Thomas M\"ollenhoff

arXiv:2411.04421·cs.LG·November 12, 2024

Variational Low-Rank Adaptation Using IVON

Bai Cong, Nico Daheim, Yuesong Shen, Daniel Cremers, Rio Yokota,, Mohammad Emtiyaz Khan, Thomas M\"ollenhoff

PDF

Open Access 1 Repo

TL;DR

This paper introduces a variational learning approach using IVON to enhance Low-Rank Adaptation (LoRA) for large language models, improving accuracy and calibration with lower costs and easier implementation.

Contribution

It demonstrates that replacing AdamW with IVON in LoRA fine-tuning significantly boosts performance and calibration in large language models, providing a more effective and efficient method.

Findings

01

IVON improves accuracy by 2.8% over AdamW for Llama-2 7B.

02

Expected calibration error is reduced by 4.6%.

03

IVON outperforms other Bayesian methods with lower cost.

Abstract

We show that variational learning can significantly improve the accuracy and calibration of Low-Rank Adaptation (LoRA) without a substantial increase in the cost. We replace AdamW by the Improved Variational Online Newton (IVON) algorithm to finetune large language models. For Llama-2 with 7 billion parameters, IVON improves the accuracy over AdamW by 2.8% and expected calibration error by 4.6%. The accuracy is also better than the other Bayesian alternatives, yet the cost is lower and the implementation is easier. Our work provides additional evidence for the effectiveness of IVON for large language models. The code is available at https://github.com/team-approx-bayes/ivon-lora.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

team-approx-bayes/ivon-lora
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Coherence Tomography Applications · Neural Networks and Reservoir Computing · Image Processing Techniques and Applications

MethodsAdamW