Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin

TL;DR
Flow of Reasoning (FoR) is a novel LLM finetuning approach that promotes diverse, high-quality reasoning solutions with minimal data by modeling reasoning as a Markovian flow on a DAG-structured graph.
Contribution
FoR introduces a principled GFlowNet-based method for training LLMs to generate diverse reasoning paths, significantly improving solution diversity and quality with limited examples.
Findings
FoR outperforms existing methods on six reasoning tasks.
With only 15 training examples, FoR discovers diverse, high-quality solutions.
FoR demonstrates strong generalization across various reasoning domains.
Abstract
The ability to generate diverse solutions to a given problem is a hallmark of human creativity. This divergent reasoning is also crucial for machines, enhancing their robustness and enabling them to assist humans in many applications such as scientific discovery. However, existing approaches to multi-step reasoning with large language models (LLMs) have mostly focused only on reasoning accuracy, without further discovering more diverse valid solutions. For example, supervised fine-tuning improves reasoning quality but requires vast labeled data, while reward-maximizing reinforcement learning finds top-reward solutions while neglecting the solution diversity. To fill this gap, we propose Flow of Reasoning (FoR), an efficient diversity-seeking LLM finetuning method aimed at improving reasoning quality and diversity with minimal data. FoR formulates multi-step LLM reasoning as a Markovian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Law
MethodsEntropy Regularization · Proximal Policy Optimization
