Bypassing the Simulation-to-reality Gap: Online Reinforcement Learning using a Supervisor
Benjamin David Evans, Johannes Betz, Hongrui Zheng, Herman A., Engelbrecht, Rahul Mangharam, and Hendrik W. Jordaan

TL;DR
This paper introduces an online reinforcement learning approach for autonomous vehicle control that uses a safety supervisor to bypass the sim-to-real gap, enabling safe, efficient training directly on physical robots.
Contribution
The paper presents a novel online DRL training method with a safety supervisor that ensures safety and improves learning efficiency without prior simulation training.
Findings
Enhanced sample efficiency in training
Agents never crash during training
Better driving performance compared to simulation-trained agents
Abstract
Deep reinforcement learning (DRL) is a promising method to learn control policies for robots only from demonstration and experience. To cover the whole dynamic behaviour of the robot, DRL training is an active exploration process typically performed in simulation environments. Although this simulation training is cheap and fast, applying DRL algorithms to real-world settings is difficult. If agents are trained until they perform safely in simulation, transferring them to physical systems is difficult due to the sim-to-real gap caused by the difference between the simulation dynamics and the physical robot. In this paper, we present a method of online training a DRL agent to drive autonomously on a physical vehicle by using a model-based safety supervisor. Our solution uses a supervisory system to check if the action selected by the agent is safe or unsafe and ensure that a safe action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics
