Offline Imitation Learning with Model-based Reverse Augmentation
Jie-Jing Shao, Hao-Sen Shi, Lan-Zhe Guo, Yu-Feng Li

TL;DR
This paper introduces a novel model-based offline imitation learning framework called SRA that uses reverse dynamic models to generate trajectories, improving exploration of unobserved states and achieving state-of-the-art results.
Contribution
The paper proposes a self-paced reverse augmentation method that enhances offline imitation learning by exploring unobserved states with a reverse dynamic model.
Findings
Effectively mitigates covariate shift in offline IL
Achieves state-of-the-art performance on benchmarks
Enables generalization beyond expert data
Abstract
In offline Imitation Learning (IL), one of the main challenges is the \textit{covariate shift} between the expert observations and the actual distribution encountered by the agent, because it is difficult to determine what action an agent should take when outside the state distribution of the expert demonstrations. Recently, the model-free solutions introduce the supplementary data and identify the latent expert-similar samples to augment the reliable samples during learning. Model-based solutions build forward dynamic models with conservatism quantification and then generate additional trajectories in the neighborhood of expert demonstrations. However, without reward supervision, these methods are often over-conservative in the out-of-expert-support regions, because only in states close to expert-observed states can there be a preferred action enabling policy optimization. To encourage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
