Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples

Fangxu Yu; Lai Jiang; Haoqiang Kang; Shibo Hao; Lianhui Qin

arXiv:2406.05673·cs.AI·May 28, 2025

Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples

Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin

PDF

Open Access 1 Repo 1 Video

TL;DR

Flow of Reasoning (FoR) is a novel LLM finetuning approach that promotes diverse, high-quality reasoning solutions with minimal data by modeling reasoning as a Markovian flow on a DAG-structured graph.

Contribution

FoR introduces a principled GFlowNet-based method for training LLMs to generate diverse reasoning paths, significantly improving solution diversity and quality with limited examples.

Findings

01

FoR outperforms existing methods on six reasoning tasks.

02

With only 15 training examples, FoR discovers diverse, high-quality solutions.

03

FoR demonstrates strong generalization across various reasoning domains.

Abstract

The ability to generate diverse solutions to a given problem is a hallmark of human creativity. This divergent reasoning is also crucial for machines, enhancing their robustness and enabling them to assist humans in many applications such as scientific discovery. However, existing approaches to multi-step reasoning with large language models (LLMs) have mostly focused only on reasoning accuracy, without further discovering more diverse valid solutions. For example, supervised fine-tuning improves reasoning quality but requires vast labeled data, while reward-maximizing reinforcement learning finds top-reward solutions while neglecting the solution diversity. To fill this gap, we propose Flow of Reasoning (FoR), an efficient diversity-seeking LLM finetuning method aimed at improving reasoning quality and diversity with minimal data. FoR formulates multi-step LLM reasoning as a Markovian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yu-fangxu/for
pytorchOfficial

Videos

Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples· slideslive

Taxonomy

TopicsArtificial Intelligence in Law

MethodsEntropy Regularization · Proximal Policy Optimization