MindSet: Vision. A toolbox for testing DNNs on key psychological experiments
Valerio Biscione, Milton L. Montero, Marin Dujmovic, Gaurav Malhotra, Dong Yin, Guillermo Puebla, Federico Adolfi, Rachel F. Heaton, John E. Hummel, Benjamin D. Evans, Karim Habashy, Jeffrey S. Bowers

TL;DR
MindSet: Vision is a comprehensive toolbox with manipulated image datasets and testing scripts designed to evaluate deep neural networks against 30 key psychological findings in human vision, facilitating hypothesis testing and model development.
Contribution
Introduces a versatile toolbox with datasets and code for testing DNNs on psychological visual perception benchmarks, enabling systematic hypothesis testing.
Findings
DNNs struggle with certain manipulated stimuli compared to humans.
The toolbox reveals gaps in current DNN models' ability to replicate human visual perception.
Configurable datasets allow for extensive testing of model robustness.
Abstract
Multiple benchmarks have been developed to assess the alignment between deep neural networks (DNNs) and human vision. In almost all cases these benchmarks are observational in the sense they are composed of behavioural and brain responses to naturalistic images that have not been manipulated to test hypotheses regarding how DNNs or humans perceive and identify objects. Here we introduce the toolbox \textit{MindSet: Vision}, consisting of a collection of image datasets and related scripts designed to test DNNs on 30 psychological findings. In all experimental conditions, the stimuli are systematically manipulated to test specific hypotheses regarding human visual perception and object recognition. In addition to providing pre-generated datasets of images, we provide code to regenerate these datasets, offering many configurable parameters which greatly extend the dataset versatility for…
Peer Reviews
Decision·Submitted to ICLR 2025
I really like the emphasis on perturbation, which is an essential missing ingredient in popular benchmarks like BrainScore. This indeed appears to be the main — and IMO a powerful — motivation for the paper. To push computational vision and neuroscience into a new and more effective regime for modeling brain data. The benchmark itself is comprised of many different interesting experiments taken from psychology and neuroscience. I appreciate the goal of searching for a single model that can expl
The biggest problem with this paper is that it does not make a clear contribution. None of the experiments are novel. There's no actual benchmarking done on models. It's a scattershot approach which is probably the only way this could be done, but there's no control across the different benchmarks — i.e., a single set of visual primitives used to test low-and mid-level vision and visual illusions vs. another set for shape/object recognition. I don't know what to take from this paper other than i
I really appreciate the effort that went into creating this toolbox. I think it could greatly benefit the community, especially since many recent tests have focused increasingly on performance in neural alignment, often overlooking functional alignment beyond accuracy alone. The presentation is well-executed, and I particularly enjoyed the failure modes discussed in the supplementary material. My only suggestion would be to consider moving some of these examples into the main text to further hig
Perhaps my only question is that it seems that is largely synthetic images on black and white, which makes it more controllable, but not sure if this would also constrain the kind of models and training routines that can be used in the test, because most models would be trained on complicated backgrounds, such as naturalistic scenes.
* The paper is extremely well written. Despite having a background at the intersection of human and machine vision as a reviewer and being quite familiar with nearly all the works cited, the authors have done a great job in articulating why this dataset is important **for machine vision & computer vision** hence the proper fit for ICLR. * The paper covers a lot of great literature at the intersection of machine vision and human vision, in addition to papers that try to link both phenomena. I thi
The greatest weakness I see in this paper is that there is little to no evaluations. There is barely only two figures to evaluate the ResNet-152, and the paper leaves me with a bittersweet feeling of wanting to know how well did the ResNet-152 do compared to the average human on the collection of all the images in the toolbox.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research and Philosophical Inquiry · Robotics and Automated Systems
