Aligning Language Models with Offline Learning from Human Feedback
Jian Hu, Li Tao, June Yang, Chandler Zhou

TL;DR
This paper introduces an offline framework for aligning language models with human preferences, avoiding online learning's instability and complexity, and achieving comparable results with less computational resources.
Contribution
It proposes new offline methods like filtering alignment, reward-weighted regression, and conditional alignment for stable, resource-efficient language model alignment.
Findings
Conditional alignment outperforms other offline methods.
The proposed methods require around 9% less computing resources.
Conditional alignment achieves results comparable to PPO.
Abstract
Learning from human preferences is crucial for language models (LMs) to effectively cater to human needs and societal values. Previous research has made notable progress by leveraging human feedback to follow instructions. However, these approaches rely primarily on online learning techniques like Proximal Policy Optimization (PPO), which have been proven unstable and challenging to tune for language models. Moreover, PPO requires complex distributed system implementation, hindering the efficiency of large-scale distributed training. In this study, we propose an offline learning from human feedback framework to align LMs without interacting with environments. Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences. By employing a loss function similar to supervised fine-tuning, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Entropy Regularization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Dense Connections · Absolute Position Encodings
