GASP: Guided Asymmetric Self-Play For Coding LLMs
Swadesh Jana, Cansu Sancaktar, Tom\'a\v{s} Dani\v{s}, Georg Martius, Antonio Orvieto, Pavel Kolev

TL;DR
GASP introduces a grounded asymmetric self-play method for coding LLMs, using real-data questions to guide curriculum learning, resulting in improved performance on coding benchmarks.
Contribution
The paper proposes GASP, a novel grounded asymmetric self-play approach that leverages real-data questions to create a curriculum for training coding LLMs, enhancing their problem-solving abilities.
Findings
Improves pass@20 on LiveCodeBench by 2.5% over unguided methods.
Successfully solves hard goalpost questions that baseline models cannot.
Demonstrates effective curriculum learning through guided self-play.
Abstract
Asymmetric self-play has emerged as a promising paradigm for post-training large language models, where a teacher continually generates questions for a student to solve at the edge of the student's learnability. Although these methods promise open-ended data generation bootstrapped from no human data, they suffer from one major problem: not all problems that are hard to solve are interesting or informative to improve the overall capabilities of the model. Current asymmetric self-play methods are goal-agnostic with no real grounding. We propose Guided Asymmetric Self-Play (GASP), where grounding is provided by real-data goalpost questions that are identified to pose a hard exploration challenge to the model. During self-play, the teacher first generates an easier variant of a hard question, and then a harder variant of that easier question, with the goal of gradually closing the gap to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Machine Learning and Algorithms
