Learning to Reason via Self-Iterative Process Feedback for Small   Language Models

Kaiyuan Chen; Jin Wang; Xuejie Zhang

arXiv:2412.08393·cs.CL·December 12, 2024

Learning to Reason via Self-Iterative Process Feedback for Small Language Models

Kaiyuan Chen, Jin Wang, Xuejie Zhang

PDF

Open Access

TL;DR

This paper introduces a self-iterative feedback method for small language models, enabling them to improve reasoning without external supervision, leading to significant performance gains and better generalization.

Contribution

It presents a novel self-feedback training approach combining ORPO and process supervision, enhancing reasoning abilities of small language models without costly external signals.

Findings

01

Improves Gemma-2B accuracy by 12.43 on GSM8K

02

Enhances Pass@1 by 3.95 on MBPP

03

Shows better out-of-domain generalization on MMLU_Math and HumanEval

Abstract

Small language models (SLMs) are more efficient, cost-effective, and customizable than large language models (LLMs), though they often underperform in specific areas like reasoning. Past methods for enhancing SLMs' reasoning, such as supervised fine-tuning and distillation, often depend on costly external signals, resulting in SLMs being overly confident with limited supervision signals, thus limiting their abilities. Therefore, this study enables SLMs to learn to reason from self-iterative feedback. By combining odds ratio preference optimization (ORPO), we fine-tune and align SLMs using positive and negative signals generated by themselves. Additionally, we introduce process supervision for rewards in preference alignment by sampling-based inference simulation and process reward models. Compared to Supervised Fine-Tuning (SFT), our method improves the performance of Gemma-2B by 12.43…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Topic Modeling

MethodsALIGN