Inverse Reinforcement Learning from Non-Stationary Learning Agents

Kavinayan P. Sivakumar; Yi Shen; Zachary Bell; Scott Nivison; Boyuan; Chen; Michael M. Zavlanos

arXiv:2410.14135·cs.LG·October 21, 2024

Inverse Reinforcement Learning from Non-Stationary Learning Agents

Kavinayan P. Sivakumar, Yi Shen, Zachary Bell, Scott Nivison, Boyuan, Chen, Michael M. Zavlanos

PDF

Open Access

TL;DR

This paper introduces a novel inverse reinforcement learning approach that estimates a learning agent's reward function from trajectories during its learning process, using a new behavior cloning variant and neural network modeling.

Contribution

The paper proposes a bundle behavior cloning method and a neural network-based reward estimation technique for inverse reinforcement learning from non-stationary learning agents.

Findings

01

The method outperforms standard behavior cloning in complexity bounds.

02

Numerical experiments validate the effectiveness of the proposed approach.

03

Theoretical analysis provides bound guarantees for the method.

Abstract

In this paper, we study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy. To address this problem, we propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function. Our method relies on a new variant of the behavior cloning algorithm, which we call bundle behavior cloning, and uses a small number of trajectories generated by the learning agent's policy at different points in time to learn a set of policies that match the distribution of actions observed in the sampled trajectories. We then use the cloned policies to train a neural network model that estimates the reward function of the learning agent. We provide a theoretical analysis to show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques

MethodsSparse Evolutionary Training