Accelerating Training Speed of Tiny Recursive Models with Curriculum Guided Adaptive Recursion
Kaleem Ullah Qasim, Jiashu Zhang

TL;DR
This paper introduces CGAR, a curriculum-guided adaptive recursion method that significantly accelerates training of tiny recursive models, reducing computational costs while maintaining high accuracy in reasoning tasks.
Contribution
The paper proposes a novel curriculum learning approach for recursive models, dynamically adjusting recursion depth and supervision importance to improve training efficiency.
Findings
Achieves 1.71x training speedup with minimal accuracy loss
Reduces FLOPs by 41.4% using Progressive Depth Curriculum
Provides 40% gradient variance reduction with Hierarchical Supervision Weighting
Abstract
Background: Recursive reasoning models achieve strong performance through iterative refinement, allowing small networks to match large language models. However, training is computationally expensive, often requiring 36 GPU-hours for Sudoku extreme. Existing models use fixed recursion depth and uniform supervision weighting, leading to inefficient training. Objectives: We propose CGAR (Curriculum-Guided Adaptive Recursion), applying curriculum learning to architectural depth. CGAR introduces Progressive Depth Curriculum (PDC) to dynamically adjust recursion depth and Hierarchical Supervision Weighting (HSW) to apply exponentially decaying importance to supervision steps. Methods: PDC implements a three-stage schedule transitioning from shallow (2, 1) to full depth (6, 3) configurations, providing 41.4% FLOPs reduction. HSW applies exponential decay to supervision steps, achieving 40%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Numerical Methods and Algorithms · Advanced Statistical Modeling Techniques
