X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Gao Tianxi; Cai Yufan; Yuan Yusi; Dong Jin Song

arXiv:2603.05290·cs.AI·March 6, 2026

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Gao Tianxi, Cai Yufan, Yuan Yusi, Dong Jin Song

PDF

Open Access

TL;DR

X-RAY introduces a formalized, explainable system for analyzing LLM reasoning capabilities using calibrated probes, revealing structural strengths and weaknesses beyond standard benchmarks.

Contribution

The paper presents a novel framework with calibrated formal probes to systematically evaluate and interpret LLM reasoning abilities across multiple scientific domains.

Findings

01

Models are robust to constraint refinement but sensitive to solution-space restructuring.

02

Calibrated probes differentiate models indistinguishable on standard benchmarks.

03

The framework reveals interpretable failure modes in LLM reasoning.

Abstract

Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating pattern matching with reasoning capability. We present X-RAY, an explainable reasoning analysis system that maps the LLM reasoning capability using calibrated, formally verified probes. We model reasoning capability as a function of extractable \textit{structure}, operationalized through formal properties such as constraint interaction, reasoning depth, and solution-space geometry. X-Ray generates probes via formal tools with controlled structural variations, enabling precise isolation of incremental structural information through formal calibration and verification. We evaluate state-of-the-art LLMs on problems ranging from junior-level to advanced in mathematics, physics, and chemistry. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Explainable Artificial Intelligence (XAI)