Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains

Juncheng Wu; Sheng Liu; Haoqin Tu; Hang Yu; Xiaoke Huang; James Zou; Cihang Xie; Yuyin Zhou

arXiv:2506.02126·cs.CL·June 4, 2025

Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains

Juncheng Wu, Sheng Liu, Haoqin Tu, Hang Yu, Xiaoke Huang, James Zou, Cihang Xie, Yuyin Zhou

PDF

Open Access

TL;DR

This paper investigates how large language models reason across domains by analyzing their step-by-step thinking, focusing on knowledge correctness and reasoning quality, revealing domain-specific strengths and limitations of fine-tuning methods.

Contribution

It introduces a fine-grained evaluation framework for reasoning processes and provides insights into how different training methods affect reasoning and knowledge use in medical and mathematical domains.

Findings

01

R1-distilled models' reasoning does not transfer well to medical domain.

02

Supervised fine-tuning improves accuracy but reduces reasoning quality.

03

Reinforcement learning enhances medical reasoning by refining knowledge use.

Abstract

Recent advances in reasoning-enhanced Large Language Models such as OpenAI-o1/3 and DeepSeek-R1 have significantly improved performance on complex tasks. However, the quality and transparency of their internal reasoning processes remain underexplored. This work moves beyond the final-answer accuracy and investigates step-by-step reasoning in the medical and mathematical domains by explicitly decomposing the thinking trajectories into two parts: knowledge and reasoning. Specifically, we introduce a fine-grained evaluation framework that judges: (1) the correctness of knowledge used (measured by Knowledge Index (KI)) and (2) the quality of reasoning (measured by Information Gain (InfoGain)). Using this framework, we study R1-distilled and base Qwen models trained with supervised fine-tuning (SFT) and/or reinforcement learning (RL) in the medical and math domains. Three intriguing findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsERP Systems Implementation and Impact · Private Equity and Venture Capital · Big Data and Business Intelligence