Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping
Simon Sinong Zhan, Philip Wang, Qingyuan Wu, Yixuan Wang, Ruochen Jiao, Chao Huang, Qi Zhu

TL;DR
This paper introduces a Model-Enhanced AIRL framework that incorporates dynamic information into reward shaping, improving performance and sample efficiency in stochastic environments.
Contribution
It proposes a novel reward shaping method that integrates transition model estimation into AIRL, with theoretical guarantees and improved empirical results.
Findings
Achieves superior performance in stochastic environments
Improves sample efficiency significantly
Maintains competitive results in deterministic environments
Abstract
In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performance is degraded. To address this issue, we propose a novel method which infuses the dynamics information into the reward shaping with the theoretical guarantee for the induced optimal policy in the stochastic environments. Incorporating our novel model-enhanced rewards, we present a novel Model-Enhanced AIRL framework, which integrates transition model estimation directly into reward shaping. Furthermore, we provide a comprehensive theoretical analysis of the reward error bound and performance difference bound for our method. The experimental results in MuJoCo benchmarks show that our method can achieve superior performance in stochastic environments and competitive performance in deterministic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
