Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving

Hyunki Seong; Jeong-Kyun Lee; Heesoo Myeong; Yongho Shin; Hyun-Mook Cho; Duck Hoon Kim; Pranav Desai; Monu Surana

arXiv:2512.13262·cs.RO·December 16, 2025

Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving

Hyunki Seong, Jeong-Kyun Lee, Heesoo Myeong, Yongho Shin, Hyun-Mook Cho, Duck Hoon Kim, Pranav Desai, Monu Surana

PDF

Open Access

TL;DR

This paper introduces post-training and test-time scaling techniques for generative models in autonomous driving, significantly improving safety, consistency, and diversity of agent behaviors without extensive retraining.

Contribution

It presents GRBO, a reinforcement learning fine-tuning method, and Warm-K, a test-time sampling strategy, to enhance autonomous driving models' safety and robustness.

Findings

01

GRBO improves safety by over 40% using only 10% of training data.

02

Warm-K enhances behavioral consistency and reactivity at test time.

03

Methods reduce covariate shift and performance gaps without retraining.

Abstract

Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit biases from datasets dominated by safe demonstrations, limiting robustness in safety-critical cases. Moreover, most studies rely on open-loop evaluation, overlooking compounding errors in closed-loop execution. We address these limitations with two complementary strategies. First, we propose Group Relative Behavior Optimization (GRBO), a reinforcement learning post-training method that fine-tunes pretrained behavior models via group relative advantage maximization with human regularization. Using only 10% of the training dataset, GRBO improves safety performance by over 40% while preserving behavioral realism. Second, we introduce Warm-K, a warm-started Top-K sampling strategy that balances consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms