OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control
Darryl Jacob, Xinyu Liu, Muchao Ye, Xiaoyong Yuan, Pan He

TL;DR
OracleTSC enhances traffic signal control by stabilizing LLM-based reinforcement learning with reward filtering and uncertainty regularization, leading to significant efficiency improvements and better generalization.
Contribution
It introduces a novel OracleTSC framework that stabilizes LLM reinforcement learning for TSC using reward hurdles and uncertainty regularization, improving performance and interpretability.
Findings
75% reduction in travel time compared to baseline
67% decrease in queue length
Effective cross-intersection transfer without additional training
Abstract
Transparent decision-making is essential for traffic signal control (TSC) systems to earn public trust. However, traditional reinforcement learning-based TSC methods function as black boxes with limited interpretability. Although large language models (LLMs) can provide natural language reasoning, reinforcement finetuning for TSC remains unstable because feedback is sparse and delayed, while most actions produce only marginal changes in congestion metrics. We introduce OracleTSC, which stabilizes LLM-based TSC through two mechanisms: (1) a reward hurdle mechanism that filters weak learning signals by subtracting a calibrated threshold from environmental rewards, and (2) uncertainty regularization that maximizes the probability of the selected response to encourage consistent decisions across sampled outputs. Experiments on the LibSignal benchmark show that OracleTSC enables a compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
