Towards Verifiable and Self-Correcting AI Physicists for Quantum Many-Body Simulations
Ken Deng, Xiangfei Wang, Guijing Duan, Chen Mo, Junkun Huang, Runqing Zhang, Ling Qian, Zhiguo Huang, Jize Han, and Di Luo

TL;DR
This paper introduces QMP-Bench, a challenging quantum many-body simulation benchmark, and PhysVEC, a multi-agent framework that ensures verifiable, error-corrected AI research, advancing trustworthy automated scientific discovery.
Contribution
It presents a new benchmark for quantum physics tasks and a multi-agent system that enforces self-verification and error correction in AI research workflows.
Findings
PhysVEC outperforms existing LLMs on QMP-Bench tasks.
PhysVEC achieves reliable physical reproductions from AI generations.
The framework scales favorably at inference time.
Abstract
While large language models (LLMs) promise to revolutionize automated scientific discovery, their application in rigorous real-world physical research is stalled by two critical barriers: a lack of realistic evaluation benchmarks and systemic LLM hallucinations. Here, we address both problems. We introduce QMP-Bench, a pioneering end-to-end research-level benchmark in quantum many-body simulation consisting of tasks extracted from high-impact prestigious journals, presenting a challenge even for current frontier LLMs. To establish a paradigm for reliable and transparent AI physicists, we present PhysVEC, a multi-agent framework that enforces self-verifiable and error correction in AI research. PhysVEC seamlessly integrates programming and scientific verifiers to guarantee coding correctness and principle-based physical validity, yielding interpretable evidence and error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
