Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

Haoran Xu; Hongyu Wang; Jiaze Li; Shunpeng Chen; Zizhao Tong; Jianzhong Ju; Zhenbo Luo; Jian Luan

arXiv:2602.13310·cs.CV·May 8, 2026

Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

Haoran Xu, Hongyu Wang, Jiaze Li, Shunpeng Chen, Zizhao Tong, Jianzhong Ju, Zhenbo Luo, Jian Luan

PDF

1 Repo

TL;DR

This paper introduces Visual Para-Thinker, a novel parallel reasoning framework for multimodal large language models that enhances visual comprehension through divide-and-conquer strategies, demonstrating improved performance on benchmark datasets.

Contribution

It pioneers the application of parallel reasoning strategies to the visual domain, integrating new attention mechanisms and a native multimodal implementation.

Findings

01

Achieves state-of-the-art results on V*, CountBench, RefCOCO, HallusionBench datasets.

02

Demonstrates that parallel reasoning improves visual comprehension and reasoning diversity.

03

Validates the effectiveness of the proposed framework through empirical experiments.

Abstract

Existing LLM test-time scaling laws emphasize the emergence of self-reflective behaviors through extended reasoning length. Nevertheless, this vertical scaling strategy often encounters plateaus in exploration as the model becomes locked into specific thinking pattern. By shifting from depth to parallelism, parallel thinking mitigates the narrowing of exploration. However, the extension of this paradigm to visual domain remains an open research question. In this paper, we first examine the role of visual partitioning in parallelized reasoning and subsequently propose two distinct strategies. Based on the above, we introduce Visual Para-Thinker, representing the inaugural parallel reasoning framework for MLLMs. To maintain path independence and promote diversity in reasoning, our approach integrates Pa-Attention alongside LPRoPE. Leveraging the vLLM framework, we have developed a native…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xuhaoran1/Visual-Para-Thinker
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.