Dual Tuning for Reasoning Efficacy-Driven Data Curation in Multimodal LLM Training
Ruobing Zheng, Tianqi Li, Jianing Li, Qingpei Guo, Yi Yuan, Jingdong Chen

TL;DR
This paper introduces Dual Tuning, a framework for selecting training data that enhances reasoning capabilities in multimodal LLMs, optimizing post-training strategies based on task and data analysis.
Contribution
It proposes a principled, data-driven approach to determine when and how reasoning training benefits multimodal LLMs, improving data curation and training efficiency.
Findings
Dual Tuning effectively identifies beneficial data for reasoning training.
The framework guides data selection for different training modes.
Analysis reveals how reinforcement learning and thinking patterns influence reasoning gains.
Abstract
Reasoning post-training improves Large Language Models (LLMs) on complex tasks such as mathematics and coding, but its benefits across diverse multimodal tasks remains uncertain. The trend of releasing parallel "Instruct" and "Thinking" models by leading teams is both resource-intensive and user-unfriendly. Prior work finds that the gains from reasoning training are influenced by multiple factors, such as base model capabilities, task characteristics, and Chain-of-Thought (CoT) data quality. However, principled criteria for determining when reasoning post-training is beneficial and which data should support it are still lacking. In this paper, we propose Dual Tuning, a reasoning efficacy-driven data curation framework for multimodal LLMs training. Given a target task and a base model, Dual Tuning jointly evaluates whether the training data is beneficial and whether reasoning training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
