Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions
Jordan Meadows, Tamsin James, Andre Freitas

TL;DR
This paper evaluates language models' ability to perform physics reasoning by removing crucial context from prompts, revealing their lack of physics-informed inference and the impact of prompt perturbations on reasoning quality.
Contribution
It introduces a systematic method to test LMs' physics reasoning by premise removal and demonstrates the models' failure to incorporate physical context effectively.
Findings
Models' reasoning degrades non-linearly with prompt perturbations.
Zero-shot scores improve with synthetic in-context examples.
Models largely ignore physical context, relying on reverse-engineering solutions.
Abstract
Language models (LMs) can hallucinate when performing complex mathematical reasoning. Physics provides a rich domain for assessing their mathematical capabilities, where physical context requires that any symbolic manipulation satisfies complex semantics (\textit{e.g.,} units, tensorial order). In this work, we systematically remove crucial context from prompts to force instances where model inference may be algebraically coherent, yet unphysical. We assess LM capabilities in this domain using a curated dataset encompassing multiple notations and Physics subdomains. Further, we improve zero-shot scores using synthetic in-context examples, and demonstrate non-linear degradation of derivation quality with perturbation strength via the progressive omission of supporting premises. We find that the models' mathematical reasoning is not physics-informed in this setting, where physical context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNon-Destructive Testing Techniques · Neural Networks and Applications · Advanced Neural Network Applications
