AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework

Yu Yao; Salil Bhatnagar; Markus Mazzola; Vasileios Belagiannis; Igor Gilitschenski; Luigi Palmieri; Simon Razniewski; Marcel Hallgarten

arXiv:2507.13729·cs.RO·July 21, 2025

AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework

Yu Yao, Salil Bhatnagar, Markus Mazzola, Vasileios Belagiannis, Igor Gilitschenski, Luigi Palmieri, Simon Razniewski, Marcel Hallgarten

PDF

Open Access

TL;DR

This paper presents AGENTS-LLM, a novel framework that uses agentic large language models to generate challenging traffic scenarios from natural language descriptions, improving scalability and control in autonomous driving testing.

Contribution

Introduces an agentic LLM-based framework for traffic scenario augmentation that offers fine-grained control and high-quality outputs with smaller models, addressing scalability issues.

Findings

01

High-quality scenario generation comparable to manual creation

02

Effective control over scenario details via natural language prompts

03

Maintains performance with smaller, cost-effective LLMs

Abstract

Rare, yet critical, scenarios pose a significant challenge in testing and evaluating autonomous driving planners. Relying solely on real-world driving scenes requires collecting massive datasets to capture these scenarios. While automatic generation of traffic scenarios appears promising, data-driven models require extensive training data and often lack fine-grained control over the output. Moreover, generating novel scenarios from scratch can introduce a distributional shift from the original training scenes which undermines the validity of evaluations especially for learning-based planners. To sidestep this, recent work proposes to generate challenging scenarios by augmenting original scenarios from the test set. However, this involves the manual augmentation of scenarios by domain experts. An approach that is unable to meet the demands for scale in the evaluation of self-driving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Human-Automation Interaction and Safety · Multimodal Machine Learning Applications