# A Hierarchical Signal Coordination and Control System Using a Hybrid Model-based and Reinforcement Learning Approach

**Authors:** Xianyue Peng, Shenyang Chen, H. Michael Zhang

arXiv: 2508.20102 · 2025-08-29

## TL;DR

This paper introduces a hierarchical traffic signal control system combining model-based strategies and reinforcement learning to optimize urban corridor traffic flow under varying demand conditions.

## Contribution

It presents a novel hierarchical framework integrating strategy selection and reinforcement learning-based signal control, improving adaptability and performance in urban traffic management.

## Key findings

- Hybrid MFC maximizes throughput in heavy demand.
- Hybrid GWC reduces stops and maintains progression.
- PAC improves travel time under moderate demand.

## Abstract

Signal control in urban corridors faces the dual challenge of maintaining arterial traffic progression while adapting to demand variations at local intersections. We propose a hierarchical traffic signal coordination and control scheme that integrates model-based optimization with reinforcement learning. The system consists of: (i) a High-Level Coordinator (HLC) that selects coordination strategies based on observed and predicted demand; (ii) a Corridor Coordinator that derives phase constraints from the selected strategy-either Max-Flow Coordination (MFC) or Green-Wave Coordination (GWC); and (iii) Hybrid Signal Agents (HSAs) that determine signal phases via reinforcement learning with action masking to enforce feasibility. Hierarchical reinforcement learning with Proximal Policy Optimization (PPO) is used to train HSA and HLC policies. At the lower level, three HSA policies-MFC-aware, GWC-aware, and pure agent control (PAC) are trained in conjunction with their respective coordination strategies. At the higher level, the HLC is trained to dynamically switch strategies using a multi-objective reward balancing corridor-level and network-wide performance. The proposed scheme was developed and evaluated on a SUMO-RLlib platform. Case results show that hybrid MFC maximizes throughput under heavy demand; hybrid GWC consistently minimizes arterial stops and maintains progression across diverse traffic conditions but can reduce network-wide efficiency; and PAC improves network-wide travel time in moderate demand but is less effective under heavy demand. The hierarchical design enables adaptive strategy selection, achieving robust performance across all demand levels.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20102/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20102/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/2508.20102/full.md

---
Source: https://tomesphere.com/paper/2508.20102