TL;DR
This paper demonstrates how a Vision Transformer autoencoder with fast weights can model rapid global context learning in the visual cortex, showing that familiarity training enhances early layer sensitivity and broadens attention scope.
Contribution
It introduces a novel ViT-based autoencoder with LoRA-based fast weights to simulate rapid contextual learning, bridging neural circuit models and deep learning.
Findings
Familiarity training aligns early and top-layer representations.
Fast weights amplify the effects of familiarity training.
Self-attention scope broadens with familiarity training.
Abstract
Recent neurophysiological studies have revealed that the early visual cortex can rapidly learn global image context, as evidenced by a sparsification of population responses and a reduction in mean activity when exposed to familiar versus novel image contexts. This phenomenon has been attributed primarily to local recurrent interactions, rather than changes in feedforward or feedback pathways, supported by both empirical findings and circuit-level modeling. Recurrent neural circuits capable of simulating these effects have been shown to reshape the geometry of neural manifolds, enhancing robustness and invariance to irrelevant variations. In this study, we employ a Vision Transformer (ViT)-based autoencoder to investigate, from a functional perspective, how familiarity training can induce sensitivity to global context in the early layers of a deep neural network. We hypothesize that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
