Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

Yue Mao; Shicheng Liu; Siyuan Xu; Minghui Zhu

arXiv:2605.08131·cs.LG·May 12, 2026

Interactive Inverse Reinforcement Learning of Interaction Scenarios via Bi-level Optimization

Yue Mao, Shicheng Liu, Siyuan Xu, Minghui Zhu

PDF

TL;DR

This paper introduces a novel interactive IRL framework formulated as a bi-level optimization problem, enabling active learning of reward functions through interaction with experts, and proposes an algorithm with convergence guarantees.

Contribution

It formulates interactive IRL as a bi-level optimization problem and develops BISIRL, an algorithm that actively learns reward functions through interaction with experts.

Findings

01

BISIRL effectively learns reward functions in interactive scenarios.

02

The algorithm converges under specified conditions.

03

Experimental results validate the approach's effectiveness.

Abstract

Inverse reinforcement learning (IRL) learns a reward function and a corresponding policy that best fit the demonstration data of an expert. However, in the current IRL setting, the learner is isolated from the expert and can only passively observe the expert demonstrations. This limits the applicability of IRL to interactive settings, where the learner actively interacts with the expert and needs to infer the expert's reward function from the interactions. To bridge the gap, this paper studies interactive IRL (IIRL) where a learner aims to learn the reward function of an expert and a policy to interact with the expert during its interactions with the expert. We formulate IIRL as a stochastic bi-level optimization problem where the lower level learns a reward function to explain the behaviors of the expert, and the upper level learns a policy to interact with the expert. We develop a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.