Deep Surrogate Q-Learning for Autonomous Driving

Maria Kalweit; Gabriel Kalweit; Moritz Werling; Joschka Boedecker

arXiv:2010.11278·cs.LG·February 18, 2022·1 cites

Deep Surrogate Q-Learning for Autonomous Driving

Maria Kalweit, Gabriel Kalweit, Moritz Werling, Joschka Boedecker

PDF

Open Access

TL;DR

This paper introduces Surrogate Q-learning with a permutation-equivariant neural network architecture and Scene-centric Experience Replay to improve data efficiency and adaptability of reinforcement learning for autonomous driving, especially in dynamic environments.

Contribution

The paper presents a novel surrogate Q-learning approach combined with a permutation-equivariant neural network architecture and a new replay technique, enhancing RL efficiency and real-world applicability in autonomous driving.

Findings

01

Reduces required driving time significantly.

02

Improves learning efficiency in variable traffic scenarios.

03

Enhances real-world policy transfer using highD dataset.

Abstract

Challenging problems of deep reinforcement learning systems with regard to the application on real systems are their adaptivity to changing environments and their efficiency w.r.t. computational resources and data. In the application of learning lane-change behavior for autonomous driving, agents have to deal with a varying number of surrounding vehicles. Furthermore, the number of required transitions imposes a bottleneck, since test drivers cannot perform an arbitrary amount of lane changes in the real world. In the off-policy setting, additional information on solving the task can be gained by observing actions from others. While in the classical RL setup this knowledge remains unused, we use other drivers as surrogates to learn the agent's value function more efficiently. We propose Surrogate Q-learning that deals with the aforementioned problems and reduces the required driving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Energy, Environment, and Transportation Policies · Advanced Bandit Algorithms Research

MethodsExperience Replay · Q-Learning