DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution
Shengda Fan, Xuyan Ye, Yankai Lin

TL;DR
DARC introduces a two-stage self-play framework for large language models that stabilizes training and improves reasoning performance by decoupling question difficulty synthesis and asymmetric self-distillation.
Contribution
The paper proposes DARC, a novel decoupled curriculum that enhances LLM self-evolution by addressing optimization instability and bootstrapping errors.
Findings
Achieves 10.9-point average improvement across nine benchmarks.
Outperforms all baselines without human annotations.
Approaches fully supervised model performance.
Abstract
Self-play with large language models has emerged as a promising paradigm for achieving self-improving artificial intelligence. However, existing self-play frameworks often suffer from optimization instability, due to (i) non-stationary objectives induced by solver-dependent reward feedback for the Questioner, and (ii) bootstrapping errors from self-generated pseudo-labels used to supervise the Solver. To mitigate these challenges, we introduce DARC (Decoupled Asymmetric Reasoning Curriculum), a two-stage framework that stabilizes the self-evolution process. First, we train the Questioner to synthesize difficulty-calibrated questions, conditioned on explicit difficulty levels and external corpora. Second, we train the Solver with an asymmetric self-distillation mechanism, where a document-augmented teacher generates high-quality pseudo-labels to supervise the student Solver that lacks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications
