C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination

Yuyang Chen; Kaiyan Zhao; Yiming Wang; Ming Yang; Bin Rao; Zhenning Li

arXiv:2604.13098·cs.MA·April 16, 2026

C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination

Yuyang Chen, Kaiyan Zhao, Yiming Wang, Ming Yang, Bin Rao, Zhenning Li

PDF

TL;DR

This paper introduces C2T, a framework that leverages LLM-derived common-sense knowledge to improve reward functions in multi-agent traffic control systems, enhancing safety, efficiency, and comfort.

Contribution

C2T is the first to distill common-sense traffic knowledge from LLMs into intrinsic rewards for MARL-based traffic control, surpassing traditional reward methods.

Findings

01

C2T significantly improves traffic efficiency, safety, and energy proxies over baselines.

02

The framework demonstrates flexibility to prioritize different goals via LLM prompt modifications.

03

C2T outperforms strong MARL baselines on CityFlow benchmarks.

Abstract

State-of-the-art (SOTA) urban traffic control increasingly employs Multi-Agent Reinforcement Learning (MARL) to coordinate Traffic Light Controllers (TLCs) and Connected Autonomous Vehicles (CAVs). However, the performance of these systems is fundamentally capped by their hand-crafted, myopic rewards (e.g., intersection pressure), which fail to capture high-level, human-centric goals like safety, flow stability, and comfort. To overcome this limitation, we introduce C2T, a novel framework that learns a common-sense coordination model from traffic-vehicle dynamics. C2T distills "common-sense" knowledge from a Large Language Model (LLM) into a learned intrinsic reward function. This new reward is then used to guide the coordination policy of a cooperative multi-intersection TLC MARL system on CityFlow-based multi-intersection benchmarks. Our framework significantly outperforms strong MARL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.