TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

Huiyuan Lai; Malvina Nissim

arXiv:2601.21711·cs.CL·January 30, 2026

TACLer: Tailored Curriculum Reinforcement Learning for Efficient Reasoning

Huiyuan Lai, Malvina Nissim

PDF

Open Access 1 Models 3 Reviews

TL;DR

TACLer introduces a curriculum reinforcement learning framework for large language models that enhances reasoning efficiency and accuracy while significantly reducing computational costs.

Contribution

It presents a novel curriculum RL approach with a hybrid reasoning paradigm, improving learning efficiency and reasoning performance in LLMs.

Findings

01

Reduces training compute by over 50%

02

Cuts inference token usage by over 42%

03

Improves accuracy by over 9% on complex math datasets

Abstract

Large Language Models (LLMs) have shown remarkable performance on complex reasoning tasks, especially when equipped with long chain-of-thought (CoT) reasoning. However, eliciting long CoT typically requires large-scale reinforcement learning (RL) training, while often leading to overthinking with redundant intermediate steps. To improve learning and reasoning efficiency, while preserving or even enhancing performance, we propose TACLer, a model-tailored curriculum reinforcement learning framework that gradually increases the complexity of the data based on the model's proficiency in multi-stage RL training. TACLer features two core components: (i) tailored curriculum learning that determines what knowledge the model lacks and needs to learn in progressive stages; (ii) a hybrid Thinking/NoThinking reasoning paradigm that balances accuracy and efficiency by enabling or disabling the…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- The paper is clearly written and the proposed contribution of designing a curriculum based on the current model's success rate is warranted. They also show strong results on the math datasets, and it interesting how the hybrid reasoning mode seems to yield concise reasoning.

Weaknesses

- The main limitation that I see with the paper is novelty. There are mainly two works cited in the summary [1, 2] that seem to propose the same idea and they do not compare against these baselines. If the authors tackle this point effectively, I would be willing to revise my score.

Reviewer 02Rating 4Confidence 3

Strengths

The idea of tailoring the curriculum to the model’s own proficiency is a clean and original way to make RL-based reasoning training more efficient. The hybrid Thinking/NoThinking setup is also elegant. The experiments are well-executed and comprehensive, showing consistent gains across multiple reasoning benchmarks. Overall, it’s a thoughtful and well-structured paper that pushes the discussion on efficient reasoning.

Weaknesses

There seems to be multiple related works that aren't referenced, could the authors comment on the difference to these listed works? https://arxiv.org/pdf/2505.14970 https://arxiv.org/pdf/2506.06632

Reviewer 03Rating 4Confidence 3

Strengths

# Strengths The paper makes meaningful contributions in both methodology and practical impact. **Originality**: TACLer introduces a novel model-adaptive curriculum learning approach that dynamically adjusts training difficulty based on the model's actual pass rate rather than arbitrary metrics (e.g., input length), effectively addressing the high truncation rates (>40%) observed in prior work. **Quality and Significance**: The framework achieves substantial practical improvements—reducing train

Weaknesses

## 1. Writing Quality Issue (Minor) **Line 48**: Repetitive phrasing - "applying techniques such as such as length-based rewards" contains duplicate "such as". This should be corrected to "applying techniques such as length-based rewards". ## 2. Insufficient Explanation of Efficiency Gains in Thinking Mode **Section 4.2 & 3.3**: The paper reports significant response length reduction (-42.7%) in Thinking mode (Table 1), but fails to explain the mechanism behind this improvement. The training me

Code & Models

Models

🤗
laihuiyuan/TACLer
model· 4 dl
4 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)