Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation
Woo Kyung Kim, Minjong Yoo, Honguk Woo

TL;DR
This paper introduces ParIRL, a novel inverse reinforcement learning framework that generates a diverse set of Pareto-optimal policies from limited expert datasets with different preferences, and distills them into a preference-conditioned model.
Contribution
The paper proposes a Pareto IRL framework that learns multiple conflicting objectives from limited datasets and distills them into a single, preference-conditioned diffusion model.
Findings
ParIRL outperforms existing IRL methods on multi-objective control tasks.
It effectively approximates the Pareto frontier with limited expert data.
Demonstrated successful application in autonomous driving scenarios.
Abstract
Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making · Human-Automation Interaction and Safety · Cognitive Science and Mapping
MethodsSparse Evolutionary Training · Entropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator · Diffusion
