Exploring the impact of low-rank adaptation on the performance,   efficiency, and regularization of RLHF

Simeng Sun; Dhawal Gupta; Mohit Iyyer

arXiv:2309.09055·cs.CL·September 19, 2023

Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

Simeng Sun, Dhawal Gupta, Mohit Iyyer

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that low-rank adaptation (LoRA) enables efficient RLHF training of large language models, reducing computational costs significantly while maintaining or improving performance and regularization effects.

Contribution

It introduces a LoRA-based RLHF method that reduces resource requirements and analyzes the effects of different regularizers and training configurations.

Findings

01

LoRA achieves better performance than full fine-tuning with only 0.2% of parameters tuned.

02

Removing KL regularization does not harm performance in LoRA setup.

03

LoRA mitigates the negative impact of PPO on factuality.

Abstract

During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources. In this technical report, we empirically investigate an efficient implementation of RLHF using low-rank adaptation (LoRA), which allows us to align the LLaMA 7B checkpoint on the Alpaca dataset using only two A100 GPUs instead of the eight required for full model fine-tuning. Despite tuning only 0.2% of LLaMA 7B's parameters, our implementation achieves better performance than the publicly-released AlpacaFarm checkpoint with full model fine-tuning. Next, we analyze several configurations of our LoRA-based PPO implementation, varying the form of the KL regularization term in the training objective. We find that (1) removing this penalty term does not harm performance on the AlpacaFarm evaluation set under our LoRA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

simengsun/alpaca_farm_lora
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsEntropy Regularization · Proximal Policy Optimization · ALIGN