TL;DR
This paper investigates culture-sensitive neurons in vision-language models, identifying neurons that influence performance on culturally grounded visual questions and introducing a new method for their detection.
Contribution
The study introduces a novel margin-based selector, ConAct, for identifying culture-sensitive neurons and analyzes their distribution across model layers.
Findings
Neurons with cultural selectivity significantly impact model performance on culturally specific questions.
The ConAct method outperforms existing probability- and entropy-based methods in neuron identification.
Culture-sensitive neurons tend to cluster in specific decoder layers depending on the model.
Abstract
Despite their impressive performance, vision-language models (VLMs) still struggle on culturally situated inputs. To understand how VLMs process culturally grounded information, we study the presence of culture-sensitive neurons, i.e., neurons whose activations show preferential sensitivity to inputs associated with particular cultural contexts. We examine whether such neurons are important for culturally diverse visual question answering and where they are located. Using the CVQA benchmark, we identify neurons of culture selectivity and perform diagnostic tests by deactivating the neurons flagged by various identification methods. Experiments on three VLMs across 25 cultural groups demonstrate the existence of neurons whose ablation disproportionately harms performance on questions about the corresponding cultures, while having limited effects on others. Moreover, we introduce a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
