Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs
Xingyang Yu, Yinghuan Zhang, Yufei Zhang, and Zijun Cui

TL;DR
This paper assesses large language models' ability to handle complex, tacit reasoning in quantum field theory and string theory, revealing strengths in explicit derivations but limitations in reconstructing implicit reasoning.
Contribution
It introduces a novel dataset and grading rubric for evaluating LLMs' understanding of highly abstract theoretical physics concepts.
Findings
LLMs perform well on explicit derivations within stable conceptual frames.
Performance drops when reconstructing omitted reasoning steps.
Models often fail to identify the correct conceptual framing for implicit tensions.
Abstract
Large language models have demonstrated impressive performance across many domains of mathematics and physics. One natural question is whether such models can support research in highly abstract theoretical fields such as quantum field theory and string theory. Evaluating this possibility faces an immediate challenge: correctness in these domains is layered, tacit, and fundamentally non-binary. Standard answer-matching metrics fail to capture whether intermediate conceptual steps are properly reconstructed or whether implicit structural constraints are respected. We construct a compact expert-curated dataset of twelve questions spanning core areas of quantum field theory and string theory, and introduce a five-level grading rubric separating statement correctness, key concept awareness, reasoning chain presence, tacit step reconstruction, and enrichment. Evaluating multiple contemporary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
