DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller   Language Models

Chengcheng Han; Xiaowei Du; Che Zhang; Yixin Lian; Xiang Li; Ming Gao,; Baoyuan Wang

arXiv:2310.05074·cs.CL·October 24, 2023·1 cites

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao,, Baoyuan Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces DialCoT, a dialogue-based reasoning method combined with PPO optimization, to improve the reasoning abilities of Smaller Language Models on complex arithmetic tasks, outperforming previous approaches.

Contribution

We propose DialCoT with dialogue-guided reasoning and PPO-based path optimization, enabling smaller models to effectively perform complex reasoning tasks.

Findings

01

Significant performance improvements on four arithmetic reasoning datasets.

02

DialCoT reduces task difficulty by breaking down questions into sub-questions.

03

PPO optimization enhances reasoning path selection and accuracy.

Abstract

Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer. Additionally, we optimize the model's reasoning path selection using the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Our method offers several advantages compared to previous approaches. Firstly, we transform the process of solving complex reasoning questions by breaking them down into a series of simpler sub-questions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hccngu/dialcot
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks

MethodsEntropy Regularization · Proximal Policy Optimization