Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning

Lukas Twist; Helen Yannakoudakis; Jie M. Zhang

arXiv:2605.21127·cs.LG·May 21, 2026

Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning

Lukas Twist, Helen Yannakoudakis, Jie M. Zhang

PDF

TL;DR

This paper investigates how fine-tuning on non-reasoning data causes models to lose explicit reasoning traces, proposing a structural evaluation framework and mitigation strategies.

Contribution

It introduces a framework to evaluate reasoning trace validity separately from answer correctness and demonstrates simple methods to mitigate reasoning-trace collapse.

Findings

01

Fine-tuning can rapidly suppress valid reasoning traces.

02

Answer-only metrics can obscure reasoning failures.

03

Loss-masking strategies can mitigate reasoning-trace collapse.

Abstract

Explicit reasoning models are trained to produce intermediate reasoning traces before final answers, but downstream fine-tuning is often performed on ordinary instruction-response data that contains no such traces. We show that this mismatch can induce reasoning-trace collapse: a fine-tuned model continues to produce plausible final answers while losing the structurally valid explicit reasoning traces that made it a reasoning model in the first place. We introduce a structural evaluation framework that separates answer correctness from reasoning-trace validity, measuring valid, empty, missing, and truncated reasoning alongside reasoning-conditioned task performance. Using this framework, we study four open-weight reasoning models and find that standard supervised fine-tuning can rapidly suppress valid reasoning traces, and that answer-only metrics can substantially obscure this failure:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.