Assessment of Reward Functions for Reinforcement Learning Traffic Signal Control under Real-World Limitations
Alvaro Cabrejas-Egea, Shaun Howell, Maksis Knutins, Colm, Connaughton

TL;DR
This study evaluates various reward functions for reinforcement learning-based traffic signal control in a realistic simulation, finding that speed maximization yields the lowest average waiting times across demand levels.
Contribution
It provides a comparative analysis of reward functions under real-world constraints, highlighting the effectiveness of speed-based rewards for traffic signal control.
Findings
Speed maximization reward leads to lowest waiting times.
Reward functions significantly influence RL traffic control performance.
Real-world constraints affect the optimal reward function choice.
Abstract
Adaptive traffic signal control is one key avenue for mitigating the growing consequences of traffic congestion. Incumbent solutions such as SCOOT and SCATS require regular and time-consuming calibration, can't optimise well for multiple road use modalities, and require the manual curation of many implementation plans. A recent alternative to these approaches are deep reinforcement learning algorithms, in which an agent learns how to take the most appropriate action for a given state of the system. This is guided by neural networks approximating a reward function that provides feedback to the agent regarding the performance of the actions taken, making it sensitive to the specific reward function chosen. Several authors have surveyed the reward functions used in the literature, but attributing outcome differences to reward function choice across works is problematic as there are many…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
