Dual Thinking and Logical Processing -- Are Multi-modal Large Language Models Closing the Gap with Human Vision ?
Kailas Dayanandan, Nikhil Kumar, Anand Sinha, Brejesh Lall

TL;DR
This paper investigates the dual thinking framework in human vision and AI models, introducing a novel adversarial dataset and analyzing how multi-modal large language models compare to human visual processing, especially in logical inference tasks.
Contribution
It introduces a new adversarial dataset to study dual thinking in vision, and provides empirical evidence on the differences between intuitive and logical processing in humans and AI models.
Findings
MLLMs and VLMs improve error correction in intuitive processing but lag in logical reasoning.
Segmentation models show errors similar to intuitive human processing and lack sub-structure understanding.
Early stopping in visual processing can lead to missing relevant information.
Abstract
The dual thinking framework considers fast, intuitive, and slower logical processing. The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ, and the latter is under-explored in current studies. We introduce a novel adversarial dataset to provide evidence for the dual thinking framework in human vision, which also facilitates the study of the qualitative behavior of deep learning models. Our psychophysical studies show the presence of multiple inferences in rapid succession, and analysis of errors shows that the early stopping of visual processing can result in missing relevant information. MLLMs (Multi-modal Large Language Models) and VLMs (Vision Language Models) have made significant progress in correcting errors in intuitive processing in human vision and showed enhanced performance on images requiring logical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsEarly Stopping
