Reinforcement learning for traffic signal control in hybrid action space
Haoqing Luo, sheng jin

TL;DR
This paper introduces TBO, a novel reinforcement learning algorithm for traffic signal control that optimizes both staging and duration simultaneously in a hybrid action space, improving traffic flow efficiency.
Contribution
The paper presents TBO, the first RL-based method to optimize staging and duration synchronously in a hybrid action space for traffic signals.
Findings
Reduces queue length by 13.78%
Decreases delay by 14.08%
Maintains fairness as indicated by Gini coefficients
Abstract
The prevailing reinforcement-learning-based traffic signal control methods are typically staging-optimizable or duration-optimizable, depending on the action spaces. In this paper, we propose a novel control architecture, TBO, which is based on hybrid proximal policy optimization. To the best of our knowledge, TBO is the first RL-based algorithm to implement synchronous optimization of the staging and duration. Compared to discrete and continuous action spaces, hybrid action space is a merged search space, in which TBO better implements the trade-off between frequent switching and unsaturated release. Experiments are given to demonstrate that TBO reduces the queue length and delay by 13.78% and 14.08% on average, respectively, compared to the existing baselines. Furthermore, we calculate the Gini coefficients of the right-of-way to indicate TBO does not harm fairness while improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Electrostatic Discharge in Electronics · Traffic Prediction and Management Techniques
