VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Ruifeng Yuan, Chenghao Xiao, Sicong Leng, Jianyu Wang, Long Li, Weiwen Xu, Hou Pong Chan, Deli Zhao, Tingyang Xu, Zhongyu Wei, Hao Zhang, Yu Rong

TL;DR
VL-Cogito introduces a multi-stage curriculum reinforcement learning framework that systematically enhances multimodal reasoning in large language models, leading to improved performance across diverse complex tasks.
Contribution
The paper presents VL-Cogito, a novel multimodal reasoning model trained with a progressive curriculum RL framework featuring dynamic difficulty adjustment and reasoning path regulation.
Findings
VL-Cogito outperforms existing models on multimodal benchmarks.
The multi-stage curriculum improves reasoning stability and accuracy.
Dynamic mechanisms balance reasoning efficiency and correctness.
Abstract
Reinforcement learning has proven its effectiveness in enhancing the reasoning capabilities of large language models. Recent research efforts have progressively extended this paradigm to multimodal reasoning tasks. Due to the inherent complexity and diversity of multimodal tasks, especially in semantic content and problem formulations, existing models often exhibit unstable performance across various domains and difficulty levels. To address these limitations, we propose VL-Cogito, an advanced multimodal reasoning model trained via a novel multi-stage Progressive Curriculum Reinforcement Learning (PCuRL) framework. PCuRL systematically guides the model through tasks of gradually increasing difficulty, substantially improving its reasoning abilities across diverse multimodal contexts. The framework introduces two key innovations: (1) an online difficulty soft weighting mechanism,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Multi-Agent Systems and Negotiation · Natural Language Processing Techniques
