CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning
Zhuo Wang, Zhuo Zhang, Yafu Li, Yu Cheng, Lizhen Qu, Zenglin Xu

TL;DR
CoTEvol introduces a genetic evolutionary framework for generating diverse and accurate Chain-of-Thought reasoning trajectories, significantly improving mathematical reasoning performance in large language models.
Contribution
It presents a novel evolutionary approach to synthesize Chain-of-Thought data, outperforming existing methods in efficiency and reasoning accuracy.
Findings
Improves correct-CoT synthesis success by over 30%.
Enhances structural diversity of reasoning trajectories.
LLMs trained on CoTEvol data gain 6.6% on average across benchmarks.
Abstract
Large Language Models (LLMs) exhibit strong mathematical reasoning when trained on high-quality Chain-of-Thought (CoT) that articulates intermediate steps, yet costly CoT curation hinders further progress. While existing remedies such as distillation from stronger LLMs and self-synthesis based on test-time search alleviate this issue, they often suffer from diminishing returns or high computing overhead.In this work, we propose CoTEvol, a genetic evolutionary framework that casts CoT generation as a population-based search over reasoning trajectories.Candidate trajectories are iteratively evolved through reflective global crossover at the trajectory level and local mutation guided by uncertainty at the step level, enabling holistic recombination and fine-grained refinement. Lightweight, task-aware fitness functions are designed to guide the evolutionary process toward accurate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
