Beyond Human Data: Scaling Self-Training for Problem-Solving with   Language Models

Avi Singh; John D. Co-Reyes; Rishabh Agarwal; Ankesh Anand; Piyush; Patil; Xavier Garcia; Peter J. Liu; James Harrison; Jaehoon Lee; Kelvin Xu,; Aaron Parisi; Abhishek Kumar; Alex Alemi; Alex Rizkowsky; Azade Nova; Ben; Adlam; Bernd Bohnet; Gamaleldin Elsayed; Hanie Sedghi; Igor Mordatch,; Isabelle Simpson; Izzeddin Gur; Jasper Snoek; Jeffrey Pennington; Jiri Hron,; Kathleen Kenealy; Kevin Swersky; Kshiteej Mahajan; Laura Culp; Lechao Xiao,; Maxwell L. Bileschi; Noah Constant; Roman Novak; Rosanne Liu; Tris Warkentin,; Yundi Qian; Yamini Bansal; Ethan Dyer; Behnam Neyshabur; Jascha; Sohl-Dickstein; Noah Fiedel

arXiv:2312.06585·cs.LG·April 19, 2024·6 cites

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush, Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu,, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben, Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi

PDF

Open Access

TL;DR

This paper introduces ReST$^{EM}$, a self-training method that leverages scalar feedback to improve language models on math and coding tasks, surpassing traditional human data fine-tuning especially at larger scales.

Contribution

The paper presents a novel self-training approach using expectation-maximization that reduces reliance on human data and enhances model performance on reasoning and coding benchmarks.

Findings

01

ReST$^{EM}$ improves performance over pure fine-tuning on human data.

02

Scaling ReST$^{EM}$ with model size yields significant gains.

03

ReST$^{EM}$ reduces dependence on human-generated data.

Abstract

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST $^{E M}$ , where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST $^{E M}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification