Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs

Haritz Puerto; Tilek Chubakov; Xiaodan Zhu; Harish Tayyar Madabushi; Iryna Gurevych

arXiv:2407.03181·cs.CL·May 28, 2025

Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs

Haritz Puerto, Tilek Chubakov, Xiaodan Zhu, Harish Tayyar Madabushi, Iryna Gurevych

PDF

Open Access 1 Repo 5 Models

TL;DR

This paper introduces a novel fine-tuning method for large language models that enables them to generate and refine multiple diverse reasoning chains within a single inference, leading to improved reasoning performance.

Contribution

The work presents a new approach called Diverse Chains of Thought (DCoT) that allows LLMs to perform in-inference refinement of reasoning chains, unlike prior methods.

Findings

01

Fine-tuning on DCoT improves performance across various tasks and model sizes.

02

Models can generate refined reasoning chains within a single inference step.

03

Significant gains observed in tasks with large result state spaces.

Abstract

Requiring a large language model (LLM) to generate intermediary reasoning steps, known as Chain of Thought (CoT), has been shown to be an effective way of boosting performance. Previous approaches have focused on generating multiple independent CoTs, combining them through ensembling or other post-hoc strategies to enhance reasoning. In this work, we introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step, which is fundamentally different from prior work that primarily operate on parallel CoT generations. DCoT allows LLMs to gain the ability to perform within-inference refinement of reasoning chains without requiring external feedback. Through a rigorous set of experiments spanning a wide range of tasks that require various reasoning types, we show that fine-tuning on DCoT improves performance over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ukplab/arxiv2024-divergent-cot
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training