Multi-objective Reinforcement learning from AI Feedback
Marcus Williams

TL;DR
MORLAIF introduces a multi-principle approach to reinforcement learning from AI feedback, decomposing preferences into simpler aspects to improve language model alignment and performance.
Contribution
It proposes decomposing preference modeling into multiple principles and combining them, enhancing alignment and outperforming standard RLAIF methods.
Findings
MORLAIF outperforms standard RLAIF baselines.
Decomposition into principles improves alignment.
Scalarization choice has minimal impact.
Abstract
This paper presents Multi-Objective Reinforcement Learning from AI Feedback (MORLAIF), a novel approach to improving the alignment and performance of language models trained using reinforcement learning from AI feedback (RLAIF). In contrast to standard approaches that train a single preference model to represent all human preferences, MORLAIF decomposes this task into multiple simpler principles, such as toxicity, factuality, and sycophancy. Separate preference models are trained for each principle using feedback from GPT-3.5-Turbo. These preference model scores are then combined using different scalarization functions to provide a reward signal for Proximal Policy Optimization (PPO) training of the target language model. Our experiments indicate that MORLAIF outperforms the standard RLAIF baselines and that MORLAIF can be used to align larger language models using smaller ones.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Digital Transformation in Industry · Advanced Research in Systems and Signal Processing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Softmax · ALIGN · Layer Normalization · Weight Decay · Linear Warmup With Cosine Annealing · Linear Layer
