MolViBench: Evaluating LLMs on Molecular Vibe Coding
Jiatong Li, Yuxuan Ren, Weida Wang, Changmeng Zheng, Xiao-yong Wei, Qing Li, Yatao Bian

TL;DR
MolViBench is a new benchmark designed to evaluate large language models' ability to generate executable code for complex molecular discovery tasks, bridging the gap between chemistry knowledge and coding skills.
Contribution
We introduce MolViBench, the first benchmark tailored for Molecular Vibe Coding, with a comprehensive evaluation framework and diverse real-world drug discovery tasks.
Findings
Evaluated 9 frontier coding LLMs on MolViBench.
Proposed a multi-layered assessment combining type-aware and AST-based analysis.
Compared three real-world Molecular Vibe Coding paradigms.
Abstract
Molecular Vibe Coding, a paradigm where chemists interact with LLMs to generate executable programs for molecular tasks, has emerged as a flexible alternative to chemical agents with predefined tools, enabling chemists to express arbitrarily complex, customized workflows. Unlike general coding tasks, molecular coding imposes a distinctive challenge that LLMs should jointly equip programming, molecular understanding, and domain-specific reasoning capabilities. However, existing benchmarks remain disconnected. General code generation benchmarks such as HumanEval and SWE-bench require no chemistry knowledge, while chemistry-focused benchmarks such as S^2-Bench and ChemCoTBench evaluate knowledge recall or property prediction rather than executable code generation. To bridge this gap, we introduce MolViBench, the first benchmark tailored for Molecular Vibe Coding. MolViBench comprises 358…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
