AI for NONMEM Coding in Pharmacometrics Research and Education: Shortcut or Pitfall?
Wenhao Zheng, Wanbing Wang, Carl M.J. Kirkpatrick, Cornelia B. Landersdorfer, Huaxiu Yao, Jiawei Zhou

TL;DR
This study evaluates AI agents' ability to generate accurate NONMEM codes for pharmacometric models, highlighting their strengths and limitations, and providing a benchmark and practical prompts for research and education.
Contribution
It introduces a standardized scoring rubric, compares AI models' performance, and offers optimized prompts to enhance NONMEM coding accuracy in pharmacometrics.
Findings
OpenAI GPT-4.1 achieved highest accuracy across tasks
AI performs well on basic model structures but needs review for complex models
The study provides a benchmark and practical prompts for AI-assisted pharmacometrics coding
Abstract
Artificial intelligence (AI) is increasingly being explored as a tool to support pharmacometric modeling, particularly in addressing the coding challenges associated with NONMEM. In this study, we evaluated the ability of seven AI agents to generate NONMEM codes across 13 pharmacometrics tasks, including a range of population pharmacokinetic (PK) and pharmacodynamic (PD) models. We further developed a standardized scoring rubric to assess code accuracy and created an optimized prompt to improve AI agent performance. Our results showed that the OpenAI o1 and gpt-4.1 models achieved the best performance, both generating codes with great accuracy for all tasks when using our optimized prompt. Overall, AI agents performed well in writing basic NONMEM model structures, providing a useful foundation for pharmacometrics model coding. However, user review and refinement remain essential,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
