In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates
Shicheng Liu, Minghui Zhu

TL;DR
This paper introduces an online IRL method that learns incrementally from ongoing trajectories, updating reward functions and policies in real-time, unlike traditional IRL which requires complete trajectories.
Contribution
The paper formulates online IRL as a bi-level optimization problem and proposes a novel algorithm with proven sub-linear regret guarantees, enabling real-time learning from partial trajectories.
Findings
The algorithm achieves sub-linear regret $O(rac{ ext{sqrt}(T)}+ ext{log} T+ ext{sqrt}(T) ext{log} T)$ in general.
For linear reward functions, the regret reduces to $O( ext{log} T)$.
Experiments validate the effectiveness of the proposed online IRL approach.
Abstract
Inverse reinforcement learning (IRL) aims to learn a reward function and a corresponding policy that best fit the demonstrated trajectories of an expert. However, current IRL works cannot learn incrementally from an ongoing trajectory because they have to wait to collect at least one complete trajectory to learn. To bridge the gap, this paper considers the problem of learning a reward function and a corresponding policy while observing the initial state-action pair of an ongoing trajectory and keeping updating the learned reward and policy when new state-action pairs of the ongoing trajectory are observed. We formulate this problem as an online bi-level optimization problem where the upper level dynamically adjusts the learned reward according to the newly observed state-action pairs with the help of a meta-regularization term, and the lower level learns the corresponding policy. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTraffic control and management · Autonomous Vehicle Technology and Safety
