Optimizing Language Model's Reasoning Abilities with Weak Supervision

Yongqi Tong; Sizhe Wang; Dawei Li; Yifan Wang; Simeng Han; Zi Lin,; Chengsong Huang; Jiaxin Huang; Jingbo Shang

arXiv:2405.04086·cs.CL·May 8, 2024·1 cites

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin,, Chengsong Huang, Jiaxin Huang, Jingbo Shang

PDF

Open Access

TL;DR

This paper introduces a weak supervision approach called self-reinforcement to enhance large language models' reasoning abilities with minimal human annotations, supported by a new benchmark dataset, PuzzleBen.

Contribution

It proposes a novel self-reinforcement method for improving LLM reasoning with limited supervision and presents PuzzleBen, a large weakly supervised benchmark for complex reasoning tasks.

Findings

01

Self-reinforcement improves LLM reasoning performance.

02

PuzzleBen enables training with fewer annotated questions.

03

The approach reduces reliance on extensive human-annotated explanations.

Abstract

While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities with minimal human supervision. In this work, we introduce self-reinforcement, which begins with Supervised Fine-Tuning (SFT) of the model using a small collection of annotated questions. Then it iteratively improves LLMs by learning from the differences in responses from the SFT and unfinetuned models on unlabeled questions. Our approach provides an efficient approach without relying heavily on extensive human-annotated explanations. However, current reasoning benchmarks typically only include…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsShrink and Fine-Tune