MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
Zhuofan Shi, Hubao A, Yufei Shao, Dongliang Huang, Hongxu An, Chunxiao Xin, Haiyang Shen, Zhenyu Wang, Yunshan Na, Gang Huang, and Xiang Jing

TL;DR
MDAgent2 is an end-to-end framework utilizing domain-specific datasets and reinforcement learning to enhance large language models for code generation and knowledge Q&A in molecular dynamics, surpassing previous baselines.
Contribution
This work introduces MDAgent2, a novel system combining datasets, multi-stage training, and reinforcement learning for improved MD code generation and Q&A capabilities.
Findings
Models outperform strong baselines in MD code generation.
The system effectively integrates code execution and self-correction.
Proposes the first benchmark for LAMMPS code generation and Q&A.
Abstract
Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Domain Adaptation and Few-Shot Learning
