Multi-objective Reinforcement learning from AI Feedback

Marcus Williams

arXiv:2406.07295·cs.LG·June 13, 2024·1 cites

Multi-objective Reinforcement learning from AI Feedback

Marcus Williams

PDF

Open Access 1 Repo

TL;DR

MORLAIF introduces a multi-principle approach to reinforcement learning from AI feedback, decomposing preferences into simpler aspects to improve language model alignment and performance.

Contribution

It proposes decomposing preference modeling into multiple principles and combining them, enhancing alignment and outperforming standard RLAIF methods.

Findings

01

MORLAIF outperforms standard RLAIF baselines.

02

Decomposition into principles improves alignment.

03

Scalarization choice has minimal impact.

Abstract

This paper presents Multi-Objective Reinforcement Learning from AI Feedback (MORLAIF), a novel approach to improving the alignment and performance of language models trained using reinforcement learning from AI feedback (RLAIF). In contrast to standard approaches that train a single preference model to represent all human preferences, MORLAIF decomposes this task into multiple simpler principles, such as toxicity, factuality, and sycophancy. Separate preference models are trained for each principle using feedback from GPT-3.5-Turbo. These preference model scores are then combined using different scalarization functions to provide a reward signal for Proximal Policy Optimization (PPO) training of the target language model. Our experiments indicate that MORLAIF outperforms the standard RLAIF baselines and that MORLAIF can be used to align larger language models using smaller ones.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

carolius/Multi-Objective-Reinforcement-Learning-from-AI-Feedback
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Digital Transformation in Industry · Advanced Research in Systems and Signal Processing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Softmax · ALIGN · Layer Normalization · Weight Decay · Linear Warmup With Cosine Annealing · Linear Layer