Bayesian Low-rank Adaptation for Large Language Models

Adam X. Yang; Maxime Robeyns; Xi Wang; Laurence Aitchison

arXiv:2308.13111·cs.LG·February 7, 2024·1 cites

Bayesian Low-rank Adaptation for Large Language Models

Adam X. Yang, Maxime Robeyns, Xi Wang, Laurence Aitchison

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper introduces Laplace-LoRA, a Bayesian method that applies a Laplace approximation to LoRA parameters, improving the calibration and uncertainty estimation of fine-tuned large language models.

Contribution

It presents a novel Bayesian approach to LoRA, enhancing model calibration and uncertainty estimation for large language models.

Findings

01

Significantly improves calibration of fine-tuned LLMs.

02

Reduces overconfidence in small dataset fine-tuning.

03

Demonstrates effectiveness through empirical evaluation.

Abstract

Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient fine-tuning of large language models (LLMs). However, fine-tuned LLMs often become overconfident especially when fine-tuned on small datasets. Bayesian methods, with their inherent ability to estimate uncertainty, serve as potent tools to mitigate overconfidence and enhance calibration. In this work, we introduce Laplace-LoRA, which applies a Bayesian approach to the LoRA parameters. Specifically, Laplace-LoRA applies a Laplace approximation to the posterior over the LoRA parameters, considerably improving the calibration of fine-tuned LLMs.

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- **Timely research**: The proposed method focuses on improving the calibration performance when finetuning LLMs on small-scale datasets, which is an important and urgent research problem along with the rapid growth of large models. - **Clear Bayesian treatment**: The proposed method adopts well-established techniques from prior works of Bayesian neural networks and uncertainty reasoning, and successfully incorporates such a Bayesian treatment into parameter-efficient tuning approaches. The pr

Weaknesses

- **Unclear uncertainty estimation**: While the proposed Laplace-LoRA naturally estimates the weight posterior, it is unclear how to apply the proposed method to compute model uncertainties. Also, it remains unclear if the proposed method can handle the structured uncertainty estimation for next-token predictions (e.g., *Uncertainty estimation in autoregressive structured prediction, ICLR'21*). It would also be interesting to compare the proposed method with semantic uncertainty [Kuhn et al., IC

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. This idea of combining Laplace inference with the fine-tuning LLMs using LoRA adapters is novel, which provides a new way of doing Bayesian fine-tuning on LLMs. 2. They conducted extensive experiments on six commonsense reasoning tasks under in/out-of-distribution settings and provided detailed analysis of the experiment results. 3. The writing is well-structured, clear and easy to understand.

Weaknesses

1. It has some novelty, but not dramatic, because both Laplace Approximation and LoRA method are well-studied. 2. It is quite weird that the Section 3 Background followed by Section 4 Results directly, without a Method section in between. Maybe it needs a better section name.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. This paper is the first to present a comprehensive result on using Laplace approximation to LoRA for LLMs. 2. This paper has clear presentation with visualization. 3. Claims are supported with sufficient amount of convincing experiment result. e.g., smaller datasets experience larger difference in ECE compares with larger datasets.

Weaknesses

1. Limited novelty. The Bayesian method part (as indicated in the paper) is well explored in the literatures listed in the software paper Laplace Redux [1]. This paper can be viewed as empirical results applying [1] to a specific model - LoRA for LLMs. 2. Majority of the benefits of Laplace-LoRA including ''post-hoc'' and ''scalable'' are from the existing method, which limits the contribution of this work. This one together with Weakness #1 above are the major concerns from my point of view. 3

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Tensor decomposition and applications · Speech and Audio Processing