Transformers self-organize like newborn visual systems when trained in prenatal worlds
Lalit Pandey, Samantha M. W. Wood, Justin N. Wood

TL;DR
Transformers trained on simulated prenatal visual data develop structures similar to newborn brains, indicating shared learning principles between artificial models and biological development.
Contribution
This study demonstrates that transformers can spontaneously develop brain-like visual structures when trained on prenatal-like data, bridging AI and neuroscience.
Findings
Transformers develop edge sensitivity in early layers.
Later layers become shape-sensitive.
Receptive fields grow across layers during training.
Abstract
Do transformers learn like brains? A key challenge in addressing this question is that transformers and brains are trained on fundamentally different data. Brains are initially "trained" on prenatal sensory experiences (e.g., retinal waves), whereas transformers are typically trained on large datasets that are not biologically plausible. We reasoned that if transformers learn like brains, then they should develop the same structure as newborn brains when exposed to the same prenatal data. To test this prediction, we simulated prenatal visual input using a retinal wave generator. Then, using self-supervised temporal learning, we trained transformers to adapt to those retinal waves. During training, the transformers spontaneously developed the same structure as newborn visual systems: (1) early layers became sensitive to edges, (2) later layers became sensitive to shapes, and (3) the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- Original study. - Carefully done analyses which support the claims of the study. - Clear writing. - Results with some significance to the field of neuroscience. - Very good literature review.
- The model is trained with backprop, which limits the biological plausibility of the model. Alternatives have been proposed elsewhere, see e.g., https://openreview.net/forum?id=lQBsLfAWhj - In the methods, I have not seen the description of the unsupervised temporal learning method. Please add.
1. he paper is well structured, and easy to follow from motivation through methods and results. The narrative is coherent, and the figures are well integrated with the text. 2. The plots and visualizations are clean, intuitive, and easy to interpret, which makes the main findings immediately accessible even to readers outside the specific subfield. 3. The idea itself is conceptually compelling and has the potential to inspire new directions at the intersection of neuroscience and machine learnin
1. The work closely mirrors the approach of [Ligeralde et al. (2024)], differing mainly in the substitution of CNNs with ViTs and the adoption of the ViT-CoT contrastive temporal loss from [Pandey et al.]. Several prior studies have already trained models on simulated retinal wave data, demonstrating the emergence of V1-like receptive fields. As a result, the scientific contribution here feels incremental (architecture substitution than fundamentally new idea or learning principle, or even a str
- The paper is well-written and easy to follow. - The neuroscientific motivation is strong. In the absence of better non-invasive device of recording the brain activities of prenatal babies, probing a computational model of the brain is the obvious way to go.
- Line 098: “did not use spatiotemporal retinal waves”. This is not quite true. Ligeralde et al. (2024) used spatiotemporal retinal waves data and also retinal waves data collected from neurophysical experiments. - I have some concerns about the experiment over temporally shuffled retinal wave data. Essentially, if you destroy the temporally smoothly varying structure, then there is no hope of learning the conventional image-to-image similarity metric using any temporally contrastive methods. In
Understanding the functio-structural relevance of retinal waves is an important question as a lot of prior research has theorized their involvement in shaping the functional organization of the cortex in-utero. If one were to build a biophysically-plausible model of the visual system to analyze developmental similarities, then an investigation into some form of pre-eye opening activity-dependent organization of the model units seems necessary. The paper does a good job of presenting the motivati
I am a bit confused about the term "self-organization" as used throughout the paper. When I think about self-organization or functional organization or organization more simply, I dissociate it from representational similarity. The methods used in the paper, such as for edge and shape selectivity, either simply compare representational similarities between the model layers and that hardcoded for orientation tuning, or discriminate between color- and shape-based preference. How do the RSMs from d
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual perception and processing mechanisms · Neural dynamics and brain function · Tactile and Sensory Interactions
