Interpreting V1 Population Activity via Image-Neural Latent Representation Alignment
Xin Wang, Zhuangzhi Gao, Hongyi Qin, Zhongli Wu, Feixiang Zhou, He Zhao

TL;DR
DINA is an interpretable framework that aligns visual stimuli with V1 neural responses in a shared latent space, enabling accurate decoding and insights into visual processing mechanisms.
Contribution
The paper introduces DINA, a dual-tower architecture that jointly aligns visual stimuli and neural responses, providing both decoding accuracy and interpretability of neural computations.
Findings
Decoding performance relies on coarse visual structures rather than semantic details.
Alignable feature maps emerge from multiple spatial regions, capturing shape and texture.
Sparse subsets of neurons primarily reconstruct these feature maps.
Abstract
Understanding the neural mechanisms underlying visual computation has long been a central challenge in neuroscience. Recent alignment based approaches have improved the accuracy of decoding visual stimuli from brain activity, yet they provide limited insight into the neural computations that give rise to these improvements. To address this gap, we propose Dual-Tower Image-Neural Alignment (DINA), an interpretable contrastive framework for analyzing population level visual computations in primary visual cortex (V1). DINA jointly trains a biologically motivated dual-tower architecture that aligns visual stimuli and corresponding V1 population responses in a shared latent space at the level of intermediate feature maps, enabling both accurate decoding and direct access to interpretable feature maps. Evaluated on large-scale two-photon calcium imaging data from mouse V1, DINA achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
