GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical   Reasoning Tasks

Ryoichi Takase; Masaya Tsunokake; Yuta Tsuchiya; Shota Inuzuka

arXiv:2410.20147·cs.LG·October 29, 2024

GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks

Ryoichi Takase, Masaya Tsunokake, Yuta Tsuchiya, Shota Inuzuka

PDF

Open Access

TL;DR

This paper explores using GFlowNet fine-tuning to enhance large language models' ability to generate diverse correct solutions in mathematical reasoning tasks, addressing the challenge of multiple solution derivation.

Contribution

It introduces GFlowNet fine-tuning for LLMs to produce diverse solutions, contrasting with traditional reward-maximizing reinforcement learning methods.

Findings

01

GFlowNet fine-tuning improves solution diversity.

02

GFlowNet achieves comparable accuracy to RL.

03

Enhanced intermediate reasoning step diversity.

Abstract

Mathematical reasoning problems are among the most challenging, as they typically require an understanding of fundamental laws to solve. The laws are universal, but the derivation of the final answer changes depending on how a problem is approached. When training large language models (LLMs), learning the capability of generating such multiple solutions is essential to accelerate their use in mathematical education. To this end, we train LLMs using generative flow network (GFlowNet). Different from reward-maximizing reinforcement learning (RL), GFlowNet fine-tuning seeks to find diverse solutions by training the LLM whose distribution is proportional to a reward function. In numerical experiments, we evaluate GFlowNet fine-tuning and reward-maximizing RL in terms of accuracy and diversity. The results show that GFlowNet fine-tuning derives correct final answers from diverse intermediate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Neural Networks and Applications