Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study
Nuno Fachada, Daniel Fernandes, Carlos M. Fernandes, Jo\~ao P. Matos-Carvalho

TL;DR
This study evaluates whether large language models can reliably generate executable agent-based models from specifications, assessing their accuracy, efficiency, and scientific validity.
Contribution
It provides a systematic evaluation of 17 LLMs on a standardized agent-based modeling task, highlighting current capabilities and limitations.
Findings
GPT-4.1 produces valid, efficient implementations consistently.
Behaviorally faithful models are achievable but not guaranteed.
Executability alone is insufficient for scientific use.
Abstract
Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from standardized specifications in a way that supports replication, verification, and validation? We address this question by evaluating 17 contemporary LLMs on a controlled ODD-to-code translation task, using the PPHPC predator-prey model as a fully specified reference. Generated Python implementations are assessed through staged executability checks, model-independent statistical comparison against a validated NetLogo baseline, and quantitative measures of runtime efficiency and maintainability. Results show that behaviorally faithful implementations are achievable but not guaranteed, and that executability alone is insufficient for scientific use. GPT-4.1 consistently produces statistically valid and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
