Accelerating Model-Based Reinforcement Learning with State-Space World Models
Maria Krinner, Elie Aljalbout, Angel Romero, Davide Scaramuzza

TL;DR
This paper introduces a novel approach to accelerate model-based reinforcement learning by using state-space models, significantly reducing training time while maintaining high performance in complex robotic tasks.
Contribution
The authors propose a parallelized training method for world models using state-space models and incorporate privileged information for better performance in partially observable environments.
Findings
Training time reduced by up to 10 times
Overall training time decreased by up to 4 times
Achieves similar task rewards and sample efficiency as existing methods
Abstract
Reinforcement learning (RL) is a powerful approach for robot learning. However, model-free RL (MFRL) requires a large number of environment interactions to learn successful control policies. This is due to the noisy RL training updates and the complexity of robotic systems, which typically involve highly non-linear dynamics and noisy sensor signals. In contrast, model-based RL (MBRL) not only trains a policy but simultaneously learns a world model that captures the environment's dynamics and rewards. The world model can either be used for planning, for data collection, or to provide first-order policy gradients for training. Leveraging a world model significantly improves sample efficiency compared to model-free RL. However, training a world model alongside the policy increases the computational complexity, leading to longer training times that are often intractable for complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
