RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

Yukun Chen; Jiaming Li; Longze Chen; Ze Gong; Jingpeng Li; Zhen Qin; Hengyu Chang; Ancheng Xu; Zhihao Yang; Hamid Alinejad-Rokny; Qiang Qu; Bo Zheng; Min Yang

arXiv:2602.21628·cs.CL·March 4, 2026

RuCL: Stratified Rubric-Based Curriculum Learning for Multimodal Large Language Model Reasoning

Yukun Chen, Jiaming Li, Longze Chen, Ze Gong, Jingpeng Li, Zhen Qin, Hengyu Chang, Ancheng Xu, Zhihao Yang, Hamid Alinejad-Rokny, Qiang Qu, Bo Zheng, Min Yang

PDF

Open Access

TL;DR

RuCL introduces a curriculum learning framework for multimodal large language models that uses stratified rubrics and dynamic reward weighting to improve reasoning capabilities efficiently.

Contribution

It proposes a novel stratified rubric-based curriculum learning method that enhances reasoning in multimodal LLMs by focusing on reward design and competence-based rubric stratification.

Findings

01

Achieves +7.83% average improvement over baseline models.

02

Reaches a state-of-the-art accuracy of 60.06% on visual reasoning benchmarks.

03

Demonstrates effective guidance from perception to logical reasoning.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a prevailing paradigm for enhancing reasoning in Multimodal Large Language Models (MLLMs). However, relying solely on outcome supervision risks reward hacking, where models learn spurious reasoning patterns to satisfy final answer checks. While recent rubric-based approaches offer fine-grained supervision signals, they suffer from high computational costs of instance-level generation and inefficient training dynamics caused by treating all rubrics as equally learnable. In this paper, we propose Stratified Rubric-based Curriculum Learning (RuCL), a novel framework that reformulates curriculum learning by shifting the focus from data selection to reward design. RuCL generates generalized rubrics for broad applicability and stratifies them based on the model's competence. By dynamically adjusting rubric weights during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)