Deep Surrogate Q-Learning for Autonomous Driving
Maria Kalweit, Gabriel Kalweit, Moritz Werling, Joschka Boedecker

TL;DR
This paper introduces Surrogate Q-learning with a permutation-equivariant neural network architecture and Scene-centric Experience Replay to improve data efficiency and adaptability of reinforcement learning for autonomous driving, especially in dynamic environments.
Contribution
The paper presents a novel surrogate Q-learning approach combined with a permutation-equivariant neural network architecture and a new replay technique, enhancing RL efficiency and real-world applicability in autonomous driving.
Findings
Reduces required driving time significantly.
Improves learning efficiency in variable traffic scenarios.
Enhances real-world policy transfer using highD dataset.
Abstract
Challenging problems of deep reinforcement learning systems with regard to the application on real systems are their adaptivity to changing environments and their efficiency w.r.t. computational resources and data. In the application of learning lane-change behavior for autonomous driving, agents have to deal with a varying number of surrounding vehicles. Furthermore, the number of required transitions imposes a bottleneck, since test drivers cannot perform an arbitrary amount of lane changes in the real world. In the off-policy setting, additional information on solving the task can be gained by observing actions from others. While in the classical RL setup this knowledge remains unused, we use other drivers as surrogates to learn the agent's value function more efficiently. We propose Surrogate Q-learning that deals with the aforementioned problems and reduces the required driving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Energy, Environment, and Transportation Policies · Advanced Bandit Algorithms Research
MethodsExperience Replay · Q-Learning
