TL;DR
Cog-DRIFT introduces an adaptive curriculum of reformulated problem variants to enhance learning from hard reasoning tasks in LLMs, significantly improving performance on previously unsolvable problems.
Contribution
The paper presents Cog-DRIFT, a novel framework that constructs and organizes reformulated problem variants into an adaptive curriculum to improve reasoning in LLMs.
Findings
Cog-DRIFT improves performance on hard reasoning problems by over 10%.
The method generalizes well across multiple models and benchmarks.
It enhances pass@k metrics and sample efficiency during testing.
Abstract
Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of LLMs, yet a fundamental limitation remains: models cannot learn from problems that are too difficult to solve under their current policy, as these yield no meaningful reward signal. We propose a simple yet effective solution based on task reformulation. We transform challenging open-ended problems into cognitively simpler variants -- such as multiple-choice and cloze formats -- that preserve the original answer while reducing the effective search space and providing denser learning signals. These reformulations span a spectrum from discriminative to generative tasks, which we exploit to bootstrap learning: models first learn from structured, easier formats, and this knowledge transfers back to improve performance on the original open-ended problems. Building on this insight, we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
