Why Do Reasoning Models Lose Coverage? The Role of Data and Forks in the Road
Ngoc-Hieu Nguyen, Parshin Shojaee, Phuc Minh Nguyen, Nan Zhang, Chandan K Reddy, Khoa D Doan, Rui Zhang

TL;DR
This paper investigates why reasoning models trained with supervised fine-tuning experience coverage shrinkage, linking it to decision-point scenarios in training data, and proposes data synthesis and diversity strategies to mitigate this issue.
Contribution
It identifies data properties, especially decision points, as key factors in reasoning model shrinkage and introduces targeted data synthesis and decoding methods to reduce this effect.
Findings
Shrinkage correlates with decision-point scenarios in training data.
Targeted data synthesis can partially mitigate reasoning shrinkage.
Diversity-encouraging decoding improves reasoning coverage.
Abstract
Recent progress in large language models has led to the emergence of reasoning models, which have shown strong performance on complex tasks through specialized fine-tuning procedures. While these methods reliably improve pass@1 accuracy, prior works have observed that they show a coverage shrinkage behavior, where pass@k degrades relative to the base model. In this paper, we investigate the reasoning shrinkage arise under SFT-based post-training. We hypothesize that this behavior is driven by properties of the fine-tuning data, specifically related to decision points or "forks in the road" scenarios where model faces indecipherable patterns with multiple valid reasoning paths. To test this hypothesis, we design controlled case studies that simulate such decision-point settings, spanning indecipherable nodes in graph branching, and reasoning modes. By tracking post-training dynamics in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
