Optimizing Language Model's Reasoning Abilities with Weak Supervision
Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin,, Chengsong Huang, Jiaxin Huang, Jingbo Shang

TL;DR
This paper introduces a weak supervision approach called self-reinforcement to enhance large language models' reasoning abilities with minimal human annotations, supported by a new benchmark dataset, PuzzleBen.
Contribution
It proposes a novel self-reinforcement method for improving LLM reasoning with limited supervision and presents PuzzleBen, a large weakly supervised benchmark for complex reasoning tasks.
Findings
Self-reinforcement improves LLM reasoning performance.
PuzzleBen enables training with fewer annotated questions.
The approach reduces reliance on extensive human-annotated explanations.
Abstract
While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities with minimal human supervision. In this work, we introduce self-reinforcement, which begins with Supervised Fine-Tuning (SFT) of the model using a small collection of annotated questions. Then it iteratively improves LLMs by learning from the differences in responses from the SFT and unfinetuned models on unlabeled questions. Our approach provides an efficient approach without relying heavily on extensive human-annotated explanations. However, current reasoning benchmarks typically only include…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsShrink and Fine-Tune
