SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

Ashima Suvarna; Kendrick Phan; Mehrab Beikzadeh; Hritik Bansal; Saadia Gabriel

arXiv:2604.08477·cs.AI·April 13, 2026

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

Ashima Suvarna, Kendrick Phan, Mehrab Beikzadeh, Hritik Bansal, Saadia Gabriel

PDF

1 Repo

TL;DR

SUPERNOVA introduces a data curation framework for reinforcement learning with verifiable rewards, significantly enhancing large language models' general reasoning capabilities across diverse tasks.

Contribution

It presents a systematic approach to adapt expert-annotated instruction datasets for RLVR, improving reasoning performance and providing practical data curation insights.

Findings

01

Models trained on SUPERNOVA outperform baselines on reasoning benchmarks.

02

Task source selection significantly impacts reasoning performance.

03

Training on SUPERNOVA yields up to 52.8% improvement on BBEH.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has significantly improved large language model (LLM) reasoning in formal domains such as mathematics and code. Despite these advancements, LLMs still struggle with general reasoning tasks requiring capabilities such as causal inference and temporal understanding. Extending RLVR to general reasoning is fundamentally constrained by the lack of high-quality, verifiable training data that spans diverse reasoning skills. To address this challenge, we propose SUPERNOVA, a data curation framework for RLVR aimed at enhancing general reasoning. Our key insight is that instruction-tuning datasets containing expert-annotated ground-truth encode rich reasoning patterns that can be systematically adapted for RLVR. To study this, we conduct 100+ controlled RL experiments to analyze how data design choices impact downstream reasoning performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asuvarna31/supernova
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.