AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems

Mohamed Amine Ferrag; Abderrahmane Lakas; Merouane Debbah

arXiv:2601.16964·cs.AI·January 26, 2026

AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

PDF

Open Access

TL;DR

AgentDrive introduces a comprehensive, large-scale benchmark dataset and evaluation framework for assessing autonomous systems' reasoning capabilities using LLM-generated driving scenarios, addressing a critical gap in safety-critical AI development.

Contribution

The paper presents a novel large-scale, structured benchmark dataset and a reasoning assessment benchmark for agentic AI in autonomous systems, generated via LLMs.

Findings

01

Proprietary models excel in contextual and policy reasoning.

02

Open models are rapidly improving in physics-grounded reasoning.

03

The dataset enables diverse evaluation of autonomous agent reasoning.

Abstract

The rapid advancement of large language models (LLMs) has sparked growing interest in their integration into autonomous systems for reasoning-driven perception, planning, and decision-making. However, evaluating and training such agentic AI models remains challenging due to the lack of large-scale, structured, and safety-critical benchmarks. This paper introduces AgentDrive, an open benchmark dataset containing 300,000 LLM-generated driving scenarios designed for training, fine-tuning, and evaluating autonomous agents under diverse conditions. AgentDrive formalizes a factorized scenario space across seven orthogonal axes: scenario type, driver behavior, environment, road layout, objective, difficulty, and traffic density. An LLM-driven prompt-to-JSON pipeline generates semantically rich, simulation-ready specifications that are validated against physical and schema constraints. Each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Human-Automation Interaction and Safety