X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes
Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song

TL;DR
X-RAY introduces a formalized, explainable system for analyzing LLM reasoning capabilities using calibrated probes, revealing structural strengths and weaknesses beyond standard benchmarks.
Contribution
The paper presents a novel framework with calibrated formal probes to systematically evaluate and interpret LLM reasoning abilities across multiple scientific domains.
Findings
Models are robust to constraint refinement but sensitive to solution-space restructuring.
Calibrated probes differentiate models indistinguishable on standard benchmarks.
The framework reveals interpretable failure modes in LLM reasoning.
Abstract
Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating pattern matching with reasoning capability. We present X-RAY, an explainable reasoning analysis system that maps the LLM reasoning capability using calibrated, formally verified probes. We model reasoning capability as a function of extractable \textit{structure}, operationalized through formal properties such as constraint interaction, reasoning depth, and solution-space geometry. X-Ray generates probes via formal tools with controlled structural variations, enabling precise isolation of incremental structural information through formal calibration and verification. We evaluate state-of-the-art LLMs on problems ranging from junior-level to advanced in mathematics, physics, and chemistry. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Explainable Artificial Intelligence (XAI)
