Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs

Xingyang Yu; Yinghuan Zhang; Yufei Zhang; and Zijun Cui

arXiv:2604.14188·physics.comp-ph·April 17, 2026

Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs

Xingyang Yu, Yinghuan Zhang, Yufei Zhang, and Zijun Cui

PDF

TL;DR

This paper assesses large language models' ability to handle complex, tacit reasoning in quantum field theory and string theory, revealing strengths in explicit derivations but limitations in reconstructing implicit reasoning.

Contribution

It introduces a novel dataset and grading rubric for evaluating LLMs' understanding of highly abstract theoretical physics concepts.

Findings

01

LLMs perform well on explicit derivations within stable conceptual frames.

02

Performance drops when reconstructing omitted reasoning steps.

03

Models often fail to identify the correct conceptual framing for implicit tensions.

Abstract

Large language models have demonstrated impressive performance across many domains of mathematics and physics. One natural question is whether such models can support research in highly abstract theoretical fields such as quantum field theory and string theory. Evaluating this possibility faces an immediate challenge: correctness in these domains is layered, tacit, and fundamentally non-binary. Standard answer-matching metrics fail to capture whether intermediate conceptual steps are properly reconstructed or whether implicit structural constraints are respected. We construct a compact expert-curated dataset of twelve questions spanning core areas of quantum field theory and string theory, and introduce a five-level grading rubric separating statement correctness, key concept awareness, reasoning chain presence, tacit step reconstruction, and enrichment. Evaluating multiple contemporary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.