# Large-Scale Traffic Signal Control Using a Novel Multi-Agent   Reinforcement Learning

**Authors:** Xiaoqiang Wang, Liangjun Ke, Zhimin Qiao, and Xinghua Chai

arXiv: 1908.03761 · 2021-09-14

## TL;DR

This paper introduces Co-DQL, a novel multi-agent reinforcement learning algorithm for large-scale traffic signal control, improving cooperation, stability, and efficiency in traffic management systems.

## Contribution

The paper proposes Co-DQL, a scalable MARL method with enhanced cooperation and stability features, specifically designed for large-scale traffic signal control problems.

## Key findings

- Outperforms state-of-the-art MARL algorithms in traffic scenarios
- Reduces average vehicle waiting time significantly
- Demonstrates stable and robust learning process

## Abstract

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multi-Agent Reinforcement Learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this paper, a new MARL, called Cooperative double Q-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q-learning method based on double estimators and the UCB policy, which can eliminate the over-estimation problem existing in traditional independent Q-learning while ensuring exploration. It uses mean field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied on TSC and tested on a multi-traffic signal simulator. According to the results obtained on several traffic scenarios, Co- DQL outperforms several state-of-the-art decentralized MARL algorithms. It can effectively shorten the average waiting time of the vehicles in the whole road system.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.03761/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/1908.03761/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1908.03761/full.md

---
Source: https://tomesphere.com/paper/1908.03761