Accelerated Sim-to-Real Deep Reinforcement Learning: Learning Collision Avoidance from Human Player
Hanlin Niu, Ze Ji, Farshad Arvin, Barry Lennox, Hujun Yin, and Joaquin, Carrasco

TL;DR
This paper introduces a sensor-level, mapless collision avoidance algorithm for mobile robots that learns efficiently from human and self-exploratory data, achieving faster training and robust real-world performance without fine-tuning.
Contribution
It proposes a novel training strategy combining human experience and self-exploration with prioritized replay, enabling rapid learning and transfer from simulation to real robots.
Findings
Achieved similar rewards with significantly fewer training steps than standard DDPG.
No collisions in less than 2-2.5 hours of training in simulated environments.
Successfully transferred the trained model to real robots without additional fine-tuning.
Abstract
This paper presents a sensor-level mapless collision avoidance algorithm for use in mobile robots that map raw sensor data to linear and angular velocities and navigate in an unknown environment without a map. An efficient training strategy is proposed to allow a robot to learn from both human experience data and self-exploratory data. A game format simulation framework is designed to allow the human player to tele-operate the mobile robot to a goal and human action is also scored using the reward function. Both human player data and self-playing data are sampled using prioritized experience replay algorithm. The proposed algorithm and training strategy have been evaluated in two different experimental configurations: \textit{Environment 1}, a simulated cluttered environment, and \textit{Environment 2}, a simulated corridor environment, to investigate the performance. It was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Human Pose and Action Recognition
MethodsAdam · Convolution · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Weight Decay · Prioritized Experience Replay · Batch Normalization · Dense Connections · Deep Deterministic Policy Gradient
