Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush, Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu,, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben, Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi

TL;DR
This paper introduces ReST$^{EM}$, a self-training method that leverages scalar feedback to improve language models on math and coding tasks, surpassing traditional human data fine-tuning especially at larger scales.
Contribution
The paper presents a novel self-training approach using expectation-maximization that reduces reliance on human data and enhances model performance on reasoning and coding benchmarks.
Findings
ReST$^{EM}$ improves performance over pure fine-tuning on human data.
Scaling ReST$^{EM}$ with model size yields significant gains.
ReST$^{EM}$ reduces dependence on human-generated data.
Abstract
Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
