Pareto Inverse Reinforcement Learning for Diverse Expert Policy   Generation

Woo Kyung Kim; Minjong Yoo; Honguk Woo

arXiv:2408.12110·cs.LG·August 23, 2024

Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

Woo Kyung Kim, Minjong Yoo, Honguk Woo

PDF

Open Access

TL;DR

This paper introduces ParIRL, a novel inverse reinforcement learning framework that generates a diverse set of Pareto-optimal policies from limited expert datasets with different preferences, and distills them into a preference-conditioned model.

Contribution

The paper proposes a Pareto IRL framework that learns multiple conflicting objectives from limited datasets and distills them into a single, preference-conditioned diffusion model.

Findings

01

ParIRL outperforms existing IRL methods on multi-objective control tasks.

02

It effectively approximates the Pareto frontier with limited expert data.

03

Demonstrated successful application in autonomous driving scenarios.

Abstract

Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making · Human-Automation Interaction and Safety · Cognitive Science and Mapping

MethodsSparse Evolutionary Training · Entropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator · Diffusion