TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning
Huiyuan Lai, Malvina Nissim

TL;DR
TACLer introduces a curriculum reinforcement learning framework for large language models that enhances reasoning efficiency and accuracy while significantly reducing computational costs.
Contribution
It presents a novel curriculum RL approach with a hybrid reasoning paradigm, improving learning efficiency and reasoning performance in LLMs.
Findings
Reduces training compute by over 50%
Cuts inference token usage by over 42%
Improves accuracy by over 9% on complex math datasets
Abstract
Large Language Models (LLMs) have shown remarkable performance on complex reasoning tasks, especially when equipped with long chain-of-thought (CoT) reasoning. However, eliciting long CoT typically requires large-scale reinforcement learning (RL) training, while often leading to overthinking with redundant intermediate steps. To improve learning and reasoning efficiency, while preserving or even enhancing performance, we propose TACLer, a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training. TACLer features two core components: (i) tailored curriculum learning that determines what knowledge the model lacks and needs to learn in progressive stages; (ii) a hybrid Thinking/NoThinking reasoning paradigm that balances accuracy and efficiency by enabling or disabling the…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper is clearly written and the proposed contribution of designing a curriculum based on the current model's success rate is warranted. They also show strong results on the math datasets, and it interesting how the hybrid reasoning mode seems to yield concise reasoning.
- The main limitation that I see with the paper is novelty. There are mainly two works cited in the summary [1, 2] that seem to propose the same idea and they do not compare against these baselines. If the authors tackle this point effectively, I would be willing to revise my score.
The idea of tailoring the curriculum to the model’s own proficiency is a clean and original way to make RL-based reasoning training more efficient. The hybrid Thinking/NoThinking setup is also elegant. The experiments are well-executed and comprehensive, showing consistent gains across multiple reasoning benchmarks. Overall, it’s a thoughtful and well-structured paper that pushes the discussion on efficient reasoning.
There seems to be multiple related works that aren't referenced, could the authors comment on the difference to these listed works? https://arxiv.org/pdf/2505.14970 https://arxiv.org/pdf/2506.06632
# Strengths The paper makes meaningful contributions in both methodology and practical impact. **Originality**: TACLer introduces a novel model-adaptive curriculum learning approach that dynamically adjusts training difficulty based on the model's actual pass rate rather than arbitrary metrics (e.g., input length), effectively addressing the high truncation rates (>40%) observed in prior work. **Quality and Significance**: The framework achieves substantial practical improvements—reducing train
## 1. Writing Quality Issue (Minor) **Line 48**: Repetitive phrasing - "applying techniques such as such as length-based rewards" contains duplicate "such as". This should be corrected to "applying techniques such as length-based rewards". ## 2. Insufficient Explanation of Efficiency Gains in Thinking Mode **Section 4.2 & 3.3**: The paper reports significant response length reduction (-42.7%) in Thinking mode (Table 1), but fails to explain the mechanism behind this improvement. The training me
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
