SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control

Xiao-Cheng Liao; Yi Mei; and Mengjie Zhang

arXiv:2511.05790·cs.LG·November 11, 2025

SymLight: Exploring Interpretable and Deployable Symbolic Policies for Traffic Signal Control

Xiao-Cheng Liao, Yi Mei, and Mengjie Zhang

PDF

Open Access 3 Reviews

TL;DR

SymLight introduces an interpretable, symbolic traffic signal control policy discovery framework using Monte Carlo Tree Search, achieving high performance and deployability on resource-limited devices.

Contribution

The paper presents SymLight, a novel MCTS-based framework for discovering interpretable symbolic policies for traffic signal control, addressing neural policy opacity and deployment challenges.

Findings

01

SymLight outperforms baseline methods on real-world datasets.

02

Produced policies are both interpretable and effective.

03

Demonstrated deployment feasibility on resource-limited devices.

Abstract

Deep Reinforcement Learning have achieved significant success in automatically devising effective traffic signal control (TSC) policies. Neural policies, however, tend to be over-parameterized and non-transparent, hindering their interpretability and deployability on resource-limited edge devices. This work presents SymLight, a priority function search framework based on Monte Carlo Tree Search (MCTS) for discovering inherently interpretable and deployable symbolic priority functions to serve as the TSC policies. The priority function, in particular, accepts traffic features as input and then outputs a priority for each traffic signal phase, which subsequently directs the phase transition. For effective search, we propose a concise yet expressive priority function representation. This helps mitigate the combinatorial explosion of the action space in MCTS. Additionally, a probabilistic…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 5

Strengths

1. Interpretable policy class with a easy understand over meaningful lane features; protected division and min/max are practical choices. 2. Empirical improvements on six CityFlow networks with significance testing. 3. Search framework is simple to implement; PSR is a plausible way to avoid uninformative rollouts.

Weaknesses

1. The core claim is that SymLight yields strong, deployable policies; however, the offline search costs (simulation calls, expansions, rollouts, wall-clock per network/intersection) are not quantified. Without this, it’s unclear whether the approach scales to larger grids or frequent retuning. 2. The reward normalizes inverse travel time, but multi-objective considerations (emissions, per-approach fairness, pedestrian delay) are absent, and it is unclear whether the method over-optimizes one m

Reviewer 02Rating 4Confidence 4

Strengths

1. The paper proposes a symbolic-policy formulation for traffic signal control, representing control logic as explicit priority functions. This approach is promising to bridge the gap between high-performance learning methods and human-interpretable rule-based systems, addressing a long-standing limitation in DRL-based TSC. 2. By integrating MCTS with a concise symbolic representation and the proposed probabilistic structural rollout strategy, SymLight enables efficient exploration of large disc

Weaknesses

1. While the central idea of this work lies in leveraging a priority function search for traffic signal control, the paper provides limited background or theoretical motivation for this concept. As a result, the rationale for adopting priority functions and their novelty relative to existing symbolic or TSC works remains unclear. 2. The Monte Carlo Tree Search has been explored in traffic signal optimization [1]. The authors are expected to clarify the difference and novelty compared to the exis

Reviewer 03Rating 8Confidence 4

Strengths

- The use of a symbolic priority function is a well-motivated approach for providing interpretability in an application domain where interpretability is a main bottleneck to realizing gains from machine learning. - The paper presents its main ideas clearly. - The empirical analysis supports the central claims of the paper: improved intersection metrics and interpretability of the learned priority function.

Weaknesses

- Clarity and relevance to learning: it took me a couple of read-throughs of the method to understand how the priority function was implemented and what the role of MCTS was. I still give the paper a good rating for clarity because -- once I understood the method -- I think the organization makes sense. So this weakness is more about, "could the paper have been more clear?" My confusion was that MCTS is usually not a learning method itself but is a decision-time search method. So at first I expe

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Traffic Prediction and Management Techniques · Adversarial Robustness in Machine Learning