Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory
Mustafa Mert \c{C}elikok, Frans A. Oliehoek, Jan-Willem van de Meent

TL;DR
This paper introduces a novel theoretical framework for inverse reinforcement learning with concave utilities, linking it to inverse game theory and mean-field games, addressing gaps in existing IRL methods.
Contribution
It develops a new approach to inverse CURL problems by establishing their equivalence to inverse game theory within mean-field games, which was not previously explored.
Findings
Most standard IRL results do not apply to CURL due to Bellman equation invalidation.
Proposes a new definition for feasible rewards in inverse CURL based on mean-field game equivalence.
Outlines future research directions and applications in human-AI collaboration.
Abstract
We consider inverse reinforcement learning problems with concave utilities. Concave Utility Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which employs a concave function of the state occupancy measure, rather than a linear function. CURL has garnered recent attention for its ability to represent instances of many important applications including the standard RL such as imitation learning, pure exploration, constrained MDPs, offline RL, human-regularized RL, and others. Inverse reinforcement learning is a powerful paradigm that focuses on recovering an unknown reward function that can rationalize the observed behaviour of an agent. There has been recent theoretical advances in inverse RL where the problem is formulated as identifying the set of feasible reward functions. However, inverse RL for CURL problems has not been considered previously. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications
MethodsSparse Evolutionary Training
