Beyond Perceptual Shortcuts: Causal-Inspired Debiasing Optimization for Generalizable Video Reasoning in Lightweight MLLMs

Jingze Wu; Quan Zhang; Hongfei Suo; Zeqiang Cai; Hongbo Chen

arXiv:2605.01324·cs.CV·May 6, 2026

Beyond Perceptual Shortcuts: Causal-Inspired Debiasing Optimization for Generalizable Video Reasoning in Lightweight MLLMs

Jingze Wu, Quan Zhang, Hongfei Suo, Zeqiang Cai, Hongbo Chen

PDF

1 Repo 1 Models

TL;DR

This paper introduces VideoThinker, a causal-inspired debiasing framework for lightweight video reasoning models that improves generalization by actively reducing perceptual shortcut biases.

Contribution

It proposes a novel two-stage debiasing method, including bias modeling and causal policy optimization, to enhance reasoning in lightweight models without extensive fine-tuning.

Findings

01

VideoThinker-R1 achieves state-of-the-art efficiency in video reasoning.

02

It surpasses larger models on multiple benchmarks with minimal training data.

03

The approach effectively reduces perceptual bias and improves generalization.

Abstract

Although reinforcement learning (RL) has significantly advanced reasoning capabilities in large multimodal language models (MLLMs), its efficacy remains limited for lightweight models essential for edge deployments. To address this issue, we leverage causal analysis and experiment to reveal the underlying phenomenon of perceptual bias, demonstrating that RL-based fine-tuning compels lightweight models to preferentially adopt perceptual shortcuts induced by data biases, rather than developing genuine reasoning abilities. Motivated by this insight, we propose VideoThinker, a causal-inspired framework that cultivates robust reasoning in lightweight models through a two-stage debiasing process. First, the Bias Aware Training stage forges a dedicated "bias model" to embody these shortcut behaviors. Then, the Causal Debiasing Policy Optimization (CDPO) algorithm fine-tunes the primary model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

falonss703/VideoThinker
github

Models

🤗
Falconss1/VideoThinker-R1-3B
model· 36 dl· ♡ 1
36 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.