Increasing Computation Resolves Conflicts in Vision Language Models

Bingyang Wang; Yijiang Li; Yitong Qiao; Maijunxian Wang; Tianwei Zhao; Yucheng Sun; Binyue Deng; Hokin Deng; Nuno Vasconcelos; Dezhi Luo

arXiv:2505.18969·cs.NE·March 2, 2026

Increasing Computation Resolves Conflicts in Vision Language Models

Bingyang Wang, Yijiang Li, Yitong Qiao, Maijunxian Wang, Tianwei Zhao, Yucheng Sun, Binyue Deng, Hokin Deng, Nuno Vasconcelos, Dezhi Luo

PDF

Open Access

TL;DR

This study shows that larger vision-language models exhibit human-like cognitive control, effectively resolving conflicts in visual tasks, with performance improving as model size increases, demonstrating emergent adaptive flexibility.

Contribution

It introduces a comprehensive benchmark for cognitive control in VLMs and reveals that larger models naturally develop conflict resolution abilities akin to human behavior.

Findings

01

Larger models resolve conflicts more effectively than smaller ones.

02

VLMs exhibit human-like conflict behavior, including dropping below chance on high-conflict trials.

03

Parameter count correlates with conflict resolution capacity.

Abstract

Cognitive control, the ability to coordinate competing information sources in pursuit of goals, is fundamental to intelligent behavior. We systematically investigate whether Vision Language Models (VLMs) exhibit cognitive control and how computational resources modulate conflict resolution. We construct a benchmark of 4,410 tasks across seven conflict paradigms (Stroop, Flanker, and five realistic variants) spanning multiple difficulty levels and visual complexities, testing 47 VLMs with rigorous experimental control. We find that VLMs exhibit robust congruency effects across all tasks, with larger models systematically resolving conflicts more effectively than smaller models. Critically, VLMs reproduce the fine-grained demand-resource relationship observed in human temporal dynamics: larger models drop below chance on incongruent high-conflict trials while smaller models fail to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems