Inverse Delayed Reinforcement Learning
Simon Sinong Zhan, Qingyuan Wu, Zhian Ruan, Frank Yang, Philip Wang,, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

TL;DR
This paper presents a novel IRL framework that effectively extracts reward features and recovers optimal policies from delayed and augmented observations using adversarial training, validated in MuJoCo environments.
Contribution
It introduces an IRL method that handles delayed disturbances through off-policy adversarial training, improving policy recovery from delayed observations.
Findings
Effective in MuJoCo environments with diverse delays
Outperforms direct observation-based methods
Theoretically guarantees better policy recovery
Abstract
Inverse Reinforcement Learning (IRL) has demonstrated effectiveness in a variety of imitation tasks. In this paper, we introduce an IRL framework designed to extract rewarding features from expert trajectories affected by delayed disturbances. Instead of relying on direct observations, our approach employs an efficient off-policy adversarial training framework to derive expert features and recover optimal policies from augmented delayed observations. Empirical evaluations in the MuJoCo environment under diverse delay settings validate the effectiveness of our method. Furthermore, we provide a theoretical analysis showing that recovering expert policies from augmented delayed observations outperforms using direct delayed observations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics · EEG and Brain-Computer Interfaces · Reinforcement Learning in Robotics
