Increasing Computation Resolves Conflicts in Vision Language Models
Bingyang Wang, Yijiang Li, Yitong Qiao, Maijunxian Wang, Tianwei Zhao, Yucheng Sun, Binyue Deng, Hokin Deng, Nuno Vasconcelos, Dezhi Luo

TL;DR
This study shows that larger vision-language models exhibit human-like cognitive control, effectively resolving conflicts in visual tasks, with performance improving as model size increases, demonstrating emergent adaptive flexibility.
Contribution
It introduces a comprehensive benchmark for cognitive control in VLMs and reveals that larger models naturally develop conflict resolution abilities akin to human behavior.
Findings
Larger models resolve conflicts more effectively than smaller ones.
VLMs exhibit human-like conflict behavior, including dropping below chance on high-conflict trials.
Parameter count correlates with conflict resolution capacity.
Abstract
Cognitive control, the ability to coordinate competing information sources in pursuit of goals, is fundamental to intelligent behavior. We systematically investigate whether Vision Language Models (VLMs) exhibit cognitive control and how computational resources modulate conflict resolution. We construct a benchmark of 4,410 tasks across seven conflict paradigms (Stroop, Flanker, and five realistic variants) spanning multiple difficulty levels and visual complexities, testing 47 VLMs with rigorous experimental control. We find that VLMs exhibit robust congruency effects across all tasks, with larger models systematically resolving conflicts more effectively than smaller models. Critically, VLMs reproduce the fine-grained demand-resource relationship observed in human temporal dynamics: larger models drop below chance on incongruent high-conflict trials while smaller models fail to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
