Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning
Lukas Twist, Helen Yannakoudakis, Jie M. Zhang

TL;DR
This paper investigates how fine-tuning on non-reasoning data causes models to lose explicit reasoning traces, proposing a structural evaluation framework and mitigation strategies.
Contribution
It introduces a framework to evaluate reasoning trace validity separately from answer correctness and demonstrates simple methods to mitigate reasoning-trace collapse.
Findings
Fine-tuning can rapidly suppress valid reasoning traces.
Answer-only metrics can obscure reasoning failures.
Loss-masking strategies can mitigate reasoning-trace collapse.
Abstract
Explicit reasoning models are trained to produce intermediate reasoning traces before final answers, but downstream fine-tuning is often performed on ordinary instruction-response data that contains no such traces. We show that this mismatch can induce reasoning-trace collapse: a fine-tuned model continues to produce plausible final answers while losing the structurally valid explicit reasoning traces that made it a reasoning model in the first place. We introduce a structural evaluation framework that separates answer correctness from reasoning-trace validity, measuring valid, empty, missing, and truncated reasoning alongside reasoning-conditioned task performance. Using this framework, we study four open-weight reasoning models and find that standard supervised fine-tuning can rapidly suppress valid reasoning traces, and that answer-only metrics can substantially obscure this failure:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
