Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations

Ken Deng; Xiangfei Wang; Guijing Duan; Chen Mo; Junkun Huang; Runqing Zhang; Ling Qian; Zhiguo Huang; Jize Han; and Di Luo

arXiv:2604.00149·physics.comp-ph·May 12, 2026

Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations

Ken Deng, Xiangfei Wang, Guijing Duan, Chen Mo, Junkun Huang, Runqing Zhang, Ling Qian, Zhiguo Huang, Jize Han, and Di Luo

PDF

TL;DR

This paper introduces QMP-Bench, a challenging quantum many-body simulation benchmark, and PhysVEC, a multi-agent framework that ensures verifiable, error-corrected AI research, advancing trustworthy automated scientific discovery.

Contribution

It presents a new benchmark for quantum physics tasks and a multi-agent system that enforces self-verification and error correction in AI research workflows.

Findings

01

PhysVEC outperforms existing LLMs on QMP-Bench tasks.

02

PhysVEC achieves reliable physical reproductions from AI generations.

03

The framework scales favorably at inference time.

Abstract

While large language models (LLMs) promise to revolutionize automated scientific discovery, their application in rigorous real-world physical research is stalled by two critical barriers: a lack of realistic evaluation benchmarks and systemic LLM hallucinations. Here, we address both problems. We introduce QMP-Bench, a pioneering end-to-end research-level benchmark in quantum many-body simulation consisting of $100$ tasks extracted from $21$ high-impact prestigious journals, presenting a challenge even for current frontier LLMs. To establish a paradigm for reliable and transparent AI physicists, we present PhysVEC, a multi-agent framework that enforces self-verifiable and error correction in AI research. PhysVEC seamlessly integrates programming and scientific verifiers to guarantee coding correctness and principle-based physical validity, yielding interpretable evidence and error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.