General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging
Samir H.A. Mohammad, Wouter Mooi, Arkady Zgonnikov

TL;DR
This study investigates the potential of general-purpose large language models to serve as human driver behavior models in automated vehicle safety assessments, highlighting their capabilities and limitations.
Contribution
It demonstrates how LLMs can mimic certain aspects of human driving behavior and identifies their current shortcomings in capturing dynamic responses.
Findings
LLMs reproduce human-like control and spatial cue dependencies.
Neither model consistently captures responses to dynamic velocity cues.
Prompt components act as model-specific biases affecting behavior.
Abstract
Human behavior models are essential as behavior references and for simulating human agents in virtual safety assessment of automated vehicles (AVs), yet current models face a trade-off between interpretability and flexibility. General-purpose large language models (LLMs) offer a promising alternative: a single model potentially deployable without parameter fitting across diverse scenarios. However, what LLMs can and cannot capture about human driving behavior remains poorly understood. We address this gap by embedding two general-purpose LLMs (OpenAI o3 and Google Gemini 2.5 Pro) as standalone, closed-loop driver agents in a simplified one-dimensional merging scenario and comparing their behavior against human data using quantitative and qualitative analyses. Both models reproduce human-like intermittent operational control and tactical dependencies on spatial cues. However, neither…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
