RIFT: Group-Relative RL Fine-Tuning for Realistic and Controllable Traffic Simulation

Keyu Chen; Wenchao Sun; Hao Cheng; Sifa Zheng

arXiv:2505.03344·cs.RO·September 23, 2025

RIFT: Group-Relative RL Fine-Tuning for Realistic and Controllable Traffic Simulation

Keyu Chen, Wenchao Sun, Hao Cheng, Sifa Zheng

PDF

Open Access 3 Reviews

TL;DR

This paper introduces RIFT, a novel group-relative reinforcement learning fine-tuning method for traffic simulation that improves realism and controllability by combining data-driven imitation learning with physics-based fine-tuning.

Contribution

The paper proposes a dual-stage simulation framework with RIFT, a new RL fine-tuning strategy that enhances style-level controllability and reduces covariate shift in traffic simulation.

Findings

01

RIFT improves realism in traffic simulation.

02

RIFT enhances controllability and reduces covariate shift.

03

The approach exposes limitations of current AV systems in closed-loop scenarios.

Abstract

Achieving both realism and controllability in closed-loop traffic simulation remains a key challenge in autonomous driving. Dataset-based methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and controllable closed-loop interactions but often lack expert demonstrations, compromising realism. To address these challenges, we introduce a dual-stage AV-centric simulation framework that conducts imitation learning pre-training in a data-driven simulator to capture trajectory-level realism and route-level controllability, followed by reinforcement learning fine-tuning in a physics-based simulator to enhance style-level controllability and mitigate covariate shift. In the fine-tuning stage, we propose RIFT, a…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 5

Strengths

- **Clarity and Presentation.** The paper is clearly written and the core ideas are easy to follow, with well-motivated design choices and consistent presentation. - **Empirical Rigor.** Experiments are extensive, and the ablations are thoughtfully constructed to isolate the contribution of key components.

Weaknesses

- **Baselines for Controllability.** The paper does not compare against closely related controllable traffic generation methods (e.g., CTG, LCTGen). Given the paper’s emphasis on controllability, these baselines are important to position the contribution. - **Related Work on Group-Relative RL.** Prior work has already explored group-relative rewards for RL fine-tuning in closed-loop driving (e.g., Gen-Drive [1]). While the present paper targets scenario generation rather than policy learning, ac

Reviewer 02Rating 4Confidence 4

Strengths

The idea of using group-relative advantages to encourage cooperation is simple but effective. Results show clear improvements in fairness, stability, and overall efficiency. The framework is decentralized and scalable, making it practical for large-scale driving simulation. The experiments are thorough and show emergent, human-like cooperative driving.

Weaknesses

The paper could more clearly analyze how the group-relative term affects individual vs. collective reward trade-offs. The related work section should discuss earlier related papers including: • A. Kuefler, J. Morton, T. Wheeler, and M. J. Kochenderfer, “Imitating Driver Behavior with Generative Adversarial Networks,” IEEE Intelligent Vehicles Symposium (IV), 2017, pp. 204–211. • R. P. Bhattacharyya, B. Wulfe, D. J. Phillips, A. Kuefler, J. Morton, R. Senanayake, and M. J. Kochenderfer, “Modelin

Reviewer 03Rating 2Confidence 5

Strengths

- The qualitative results are comprehensive and good for understanding the performance - I appreciate authors provide several baselines, but the key baselines, such as CAT-K or Waymo Sim Agent Challenge would be more insightful.

Weaknesses

- This works proposes a different metrics compared to Waymo Sim Agent Challenge, I suggest the authors either take 1-2 baselines from Sim Agent Challenge and evaluate their settings or, simply adapt RIFT for Waymo Sim Agent Challenge. Currently is hard for reviewers to understand what are the strength and limitations of RIFT just by looking at table numbers. - For AV Evaluation, it is hard to draw insights from Table 2 since there are two factors (Sim Agents and different planners). For example

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic Prediction and Management Techniques · Simulation Techniques and Applications · Traffic control and management