Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Ximing Lu; David Acuna; Jaehun Jung; Jian Hu; Di Zhang; Shizhe Diao; Yunheng Zou; Shaokun Zhang; Brandon Cui; Mingjie Liu; Hyunwoo Kim; Prithviraj Ammanabrolu; Jan Kautz; Yi Dong; Yejin Choi

arXiv:2601.22975·cs.AI·February 4, 2026

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Ximing Lu, David Acuna, Jaehun Jung, Jian Hu, Di Zhang, Shizhe Diao, Yunheng Zou, Shaokun Zhang, Brandon Cui, Mingjie Liu, Hyunwoo Kim, Prithviraj Ammanabrolu, Jan Kautz, Yi Dong, Yejin Choi

PDF

Open Access 2 Models 1 Datasets

TL;DR

Golden Goose introduces a simple method to generate unlimited reasoning-rich RLVR tasks from unverifiable internet text, significantly enhancing model reasoning capabilities across multiple domains and setting new state-of-the-art results.

Contribution

The paper presents a novel technique to synthesize large-scale RLVR datasets from unverifiable text, enabling scalable reasoning training for language models.

Findings

01

Revives saturated models with new RLVR data

02

Achieves state-of-the-art results on multiple benchmarks

03

Successfully applies to cybersecurity domain with no prior data

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has become a cornerstone for unlocking complex reasoning in Large Language Models (LLMs). Yet, scaling up RL is bottlenecked by limited existing verifiable data, where improvements increasingly saturate over prolonged training. To overcome this, we propose Golden Goose, a simple trick to synthesize unlimited RLVR tasks from unverifiable internet text by constructing a multiple-choice question-answering version of the fill-in-the-middle task. Given a source text, we prompt an LLM to identify and mask key reasoning steps, then generate a set of diverse, plausible distractors. This enables us to leverage reasoning-rich unverifiable corpora typically excluded from prior RLVR data construction (e.g., science textbooks) to synthesize GooseReason-0.7M, a large-scale RLVR dataset with over 0.7 million tasks spanning mathematics, programming,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

nvidia/Nemotron-Research-GooseReason-0.7M
dataset· 473 dl
473 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques