PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models
Yinggan Xu, Yue Liu, Zhiqiang Gao, Changnan Peng, Di Luo

TL;DR
PhySense is a new benchmark designed to evaluate large language models' ability to perform principle-based physics reasoning, revealing their struggles to emulate expert-like, concise, and interpretable problem-solving approaches.
Contribution
The paper introduces PhySense, a physics reasoning benchmark that highlights the gap between LLMs and human experts in applying core principles for scientific reasoning.
Findings
LLMs often fail to follow principle-based reasoning paths.
Current LLMs generate lengthy, opaque solutions instead of concise, principle-driven ones.
PhySense exposes the limitations of LLMs in physics reasoning.
Abstract
Large language models (LLMs) have rapidly advanced and are increasingly capable of tackling complex scientific problems, including those in physics. Despite this progress, current LLMs often fail to emulate the concise, principle-based reasoning characteristic of human experts, instead generating lengthy and opaque solutions. This discrepancy highlights a crucial gap in their ability to apply core physical principles for efficient and interpretable problem solving. To systematically investigate this limitation, we introduce PhySense, a novel principle-based physics reasoning benchmark designed to be easily solvable by experts using guiding principles, yet deceptively difficult for LLMs without principle-first reasoning. Our evaluation across multiple state-of-the-art LLMs and prompt types reveals a consistent failure to align with expert-like reasoning paths, providing insights for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsALIGN
