OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control

Darryl Jacob; Xinyu Liu; Muchao Ye; Xiaoyong Yuan; Pan He

arXiv:2605.08516·cs.AI·May 12, 2026

OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control

Darryl Jacob, Xinyu Liu, Muchao Ye, Xiaoyong Yuan, Pan He

PDF

TL;DR

OracleTSC enhances traffic signal control by stabilizing LLM-based reinforcement learning with reward filtering and uncertainty regularization, leading to significant efficiency improvements and better generalization.

Contribution

It introduces a novel OracleTSC framework that stabilizes LLM reinforcement learning for TSC using reward hurdles and uncertainty regularization, improving performance and interpretability.

Findings

01

75% reduction in travel time compared to baseline

02

67% decrease in queue length

03

Effective cross-intersection transfer without additional training

Abstract

Transparent decision-making is essential for traffic signal control (TSC) systems to earn public trust. However, traditional reinforcement learning-based TSC methods function as black boxes with limited interpretability. Although large language models (LLMs) can provide natural language reasoning, reinforcement finetuning for TSC remains unstable because feedback is sparse and delayed, while most actions produce only marginal changes in congestion metrics. We introduce OracleTSC, which stabilizes LLM-based TSC through two mechanisms: (1) a reward hurdle mechanism that filters weak learning signals by subtracting a calibrated threshold from environmental rewards, and (2) uncertainty regularization that maximizes the probability of the selected response to encourage consistent decisions across sampled outputs. Experiments on the LibSignal benchmark show that OracleTSC enables a compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.