Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun, Mihaela van der Schaar

TL;DR
This paper introduces a novel demonstration-based alignment method for large language models using inverse reinforcement learning, addressing issues of noisy labels and high annotation costs, and demonstrating strong empirical results.
Contribution
It formalizes Alignment from Demonstrations within a reinforcement learning framework, proposing divergence minimization objectives and an efficient algorithm for improved LLM alignment.
Findings
Effective alignment on Harmless and Helpful tasks
Addresses noisy labels and privacy concerns in LLM alignment
Demonstrates strong empirical performance with a simple approach
Abstract
Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility. However, existing methods, primarily based on preference datasets, face challenges such as noisy labels, high annotation costs, and privacy concerns. In this work, we introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges. We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals. Drawing insights from forward and inverse reinforcement learning, we introduce divergence minimization objectives for AfD. Analytically, we elucidate the mass-covering and mode-seeking behaviors of various approaches, explaining when and why certain methods are superior. Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Digital Rights Management and Security
