Reasoning Paths Optimization: Learning to Reason and Explore From   Diverse Paths

Yew Ken Chia; Guizhen Chen; Weiwen Xu; Luu Anh Tuan; Soujanya Poria,; Lidong Bing

arXiv:2410.10858·cs.CL·October 16, 2024

Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths

Yew Ken Chia, Guizhen Chen, Weiwen Xu, Luu Anh Tuan, Soujanya Poria,, Lidong Bing

PDF

Open Access

TL;DR

This paper introduces Reasoning Paths Optimization (RPO), a training framework that improves large language models' multi-step reasoning by learning from diverse paths, leading to better problem-solving without relying on large annotated datasets.

Contribution

The paper presents RPO, a scalable, data-efficient training method that enhances reasoning by encouraging exploration of diverse reasoning paths in large language models.

Findings

01

Up to 3.1% improvement on GSM8K

02

Up to 4.3% improvement on MMLU STEM

03

Effective without large-scale human annotations

Abstract

Advanced models such as OpenAI o1 exhibit impressive problem-solving capabilities through step-by-step reasoning. However, they may still falter on more complex problems, making errors that disrupt their reasoning paths. We attribute this to the expansive solution space, where each step has the risk of diverging into mistakes. To enhance language model reasoning, we introduce a specialized training framework called Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths. Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance. Reasoning Paths Optimization does not rely on large-scale human-annotated rationales or outputs from closed-source models, making it scalable and data-efficient. We focus on multi-step reasoning tasks, such as math word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making

MethodsFocus