Inverse Delayed Reinforcement Learning

Simon Sinong Zhan; Qingyuan Wu; Zhian Ruan; Frank Yang; Philip Wang,; Yixuan Wang; Ruochen Jiao; Chao Huang; Qi Zhu

arXiv:2412.02931·cs.LG·December 5, 2024

Inverse Delayed Reinforcement Learning

Simon Sinong Zhan, Qingyuan Wu, Zhian Ruan, Frank Yang, Philip Wang,, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

PDF

Open Access

TL;DR

This paper presents a novel IRL framework that effectively extracts reward features and recovers optimal policies from delayed and augmented observations using adversarial training, validated in MuJoCo environments.

Contribution

It introduces an IRL method that handles delayed disturbances through off-policy adversarial training, improving policy recovery from delayed observations.

Findings

01

Effective in MuJoCo environments with diverse delays

02

Outperforms direct observation-based methods

03

Theoretically guarantees better policy recovery

Abstract

Inverse Reinforcement Learning (IRL) has demonstrated effectiveness in a variety of imitation tasks. In this paper, we introduce an IRL framework designed to extract rewarding features from expert trajectories affected by delayed disturbances. Instead of relying on direct observations, our approach employs an efficient off-policy adversarial training framework to derive expert features and recover optimal policies from augmented delayed observations. Empirical evaluations in the MuJoCo environment under diverse delay settings validate the effectiveness of our method. Furthermore, we provide a theoretical analysis showing that recovering expert policies from augmented delayed observations outperforms using direct delayed observations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics · EEG and Brain-Computer Interfaces · Reinforcement Learning in Robotics