Inverse-RLignment: Large Language Model Alignment from Demonstrations   through Inverse Reinforcement Learning

Hao Sun; Mihaela van der Schaar

arXiv:2405.15624·cs.LG·January 28, 2025

Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning

Hao Sun, Mihaela van der Schaar

PDF

Open Access

TL;DR

This paper introduces a novel demonstration-based alignment method for large language models using inverse reinforcement learning, addressing issues of noisy labels and high annotation costs, and demonstrating strong empirical results.

Contribution

It formalizes Alignment from Demonstrations within a reinforcement learning framework, proposing divergence minimization objectives and an efficient algorithm for improved LLM alignment.

Findings

01

Effective alignment on Harmless and Helpful tasks

02

Addresses noisy labels and privacy concerns in LLM alignment

03

Demonstrates strong empirical performance with a simple approach

Abstract

Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility. However, existing methods, primarily based on preference datasets, face challenges such as noisy labels, high annotation costs, and privacy concerns. In this work, we introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges. We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals. Drawing insights from forward and inverse reinforcement learning, we introduce divergence minimization objectives for AfD. Analytically, we elucidate the mass-covering and mode-seeking behaviors of various approaches, explaining when and why certain methods are superior. Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Rights Management and Security