Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts
Yuxin Pan, Fangzhen Lin

TL;DR
This paper introduces BIFRL, a novel reinforcement learning framework that combines backward imitation from high-value states with forward reinforcement, improving sample efficiency and performance in model-based RL tasks.
Contribution
The paper proposes BIFRL, integrating backward imitation learning with forward reinforcement, and introduces a value-regularized GAN to enhance valuable state sampling.
Findings
BIFRL outperforms baseline methods in sample efficiency.
BIFRL achieves competitive asymptotic performance on MuJoCo tasks.
Theoretical conditions show BIFRL's superiority over traditional methods.
Abstract
Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model to reduce interactions with the real environment. The recent model-based RL method considers the way to learn a backward model that specifies the conditional probability of the previous state given the previous action and the current state to additionally generate backward rollout trajectories. However, in this type of model-based method, the samples derived from backward rollouts and those from forward rollouts are simply aggregated together to optimize the policy via the model-free RL algorithm, which may decrease both the sample efficiency and the convergence rate. This is because such an approach ignores the fact that backward rollout traces are often generated starting from some high-value states and are certainly more instructive for the agent to improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Robot Manipulation and Learning
