Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks
Abhranil Chandra, Ayush Agrawal, Arian Hosseini, Sebastian Fischmeister, Rishabh Agarwal, Navin Goyal, Aaron Courville

TL;DR
Training language models on synthetic, distribution-matched chain-of-thought traces—even if incorrect—can enhance reasoning performance more effectively than using human-annotated data, due to better alignment with the model's own distribution.
Contribution
This paper demonstrates that synthetic, distribution-matched reasoning traces improve model reasoning more than human data, highlighting the importance of data distribution in training.
Findings
Synthetic data closer to model distribution enhances reasoning.
Partially flawed reasoning traces still provide valuable learning signals.
Distribution alignment improves performance across multiple reasoning tasks.
Abstract
We present the surprising finding that a language model's reasoning capabilities can be improved by training on synthetic datasets of chain-of-thought (CoT) traces from more capable models, even when all of those traces lead to an incorrect final answer. Our experiments show this approach can yield better performance on reasoning tasks than training on human-annotated datasets. We hypothesize that two key factors explain this phenomenon: first, the distribution of synthetic data is inherently closer to the language model's own distribution, making it more amenable to learning. Second, these `incorrect' traces are often only partially flawed and contain valid reasoning steps from which the model can learn. To further test the first hypothesis, we use a language model to paraphrase human-annotated traces -- shifting their distribution closer to the model's own distribution -- and show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
