Finding Shared Decodable Concepts and their Negations in the Brain
Cory Efird, Alex Murphy, Joel Zylberberg, Alona Fyshe

TL;DR
This paper introduces a novel method combining multimodal neural networks and clustering to identify shared decodable visual concepts and their negations in the human brain, revealing both known and new semantic representations.
Contribution
It presents a new approach using CLIP embeddings and adapted clustering to discover shared brain representations of visual concepts across individuals.
Findings
Identified shared decodable concepts in brain regions for faces, places, and bodies.
Discovered new brain areas tuned for legs, hands, and numerosity.
Differentiated visual features like color and shape in food images.
Abstract
Prior work has offered evidence for functional localization in the brain; different anatomical regions preferentially activate for certain types of visual input. For example, the fusiform face area preferentially activates for visual stimuli that include a face. However, the spectrum of visual semantics is extensive, and only a few semantically-tuned patches of cortex have so far been identified in the human brain. Using a multimodal (natural language and image) neural network architecture (CLIP) we train a highly accurate contrastive model that maps brain responses during naturalistic image viewing to CLIP embeddings. We then use a novel adaptation of the DBSCAN clustering algorithm to cluster the parameters of these participant-specific contrastive models. This reveals what we call Shared Decodable Concepts (SDCs): clusters in CLIP space that are decodable from common sets of voxels…
Peer Reviews
Decision·ICLR 2025 Poster
- The discovered novel selectivities observed in the study are quite interesting and potentially good hypotheses for future brain imaging studies - The perspective that both active and suppressed neural responses are useful for identifying brain regions tied to distinct visual concepts is somewhat unique and not something I've seen in prior studies - This method circumvents the need to align subjects data, while still being able to extract shared representational structure across participants
- More methodological details are needed. More motivation behind why DBSCAN clustering used over other clustering algorithms? How do the authors select the epsilon which ultimately determines the number of clusters? - The authors should acknowledge that there is a lot of human bias involved in interpreting the concepts based on the activated images. For e.g. in 4.1, the authors interpret the negative representative images to indicate images that lack clearly visible faces. However, many other i
- The idea of clustering different subjects' parameter/weight vectors as a means of studying what is common (or different) across individuals' brains is a promising direction that merits more attention and exploration in the field. It's a clever way to overcome the challenges of aligning different brains, while respecting the nuances of individuals' representational signatures. - Beyond the well-studied forms of category selectivity that pop out in the SDC analysis, some of the observations are
- In general the paper does a poor job citing other relevant literature, missing some critical papers on category selectivity in the visual system, as well as on recent efforts to use CLIP models for data-driven interpretability of visual feature tuning. Given the topic of the paper, the fact that it has only 20 citations total is insufficient to cover recent work in this area. For example, the authors should connect their observations of body-related selectivity to other works such as: https://
Strengths of the paper are that it tackles an important problem, applies it to a beautiful data set, and reports interpretable results that either serve to establish the validity of the method (via replication of established findings) or report intriguing new results. Another strength is the emphasis on the lowest responses of functional clusters, something that is widely appreciated in the field but rarely if ever explicitly discussed. A third strength is the care that was taken to re-calculate
The main weaknesses of the paper concern the failure to situate the paper in the context of relevant prior work. Some of the results are described as novel without reference to related findings published previously, For example, the claim of selectivity for different aspects of the body should acknowledge a large prior literature on this topic, e.g.: Orlov, T., Makin, T. R., & Zohary, E. (2010). Topographic representation of the human body in the occipitotemporal cortex. Neuron, 68(3), 586-600.
Videos
Taxonomy
TopicsLanguage, Metaphor, and Cognition
MethodsContrastive Language-Image Pre-training
