Better, But Not Sufficient: Testing Video ANNs Against Macaque IT Dynamics
Matteo Dunnhofer, Christian Micheloni, Kohitij Kar

TL;DR
This study compares macaque IT neural responses during naturalistic videos with various ANN models, revealing that current models mainly capture appearance-bound dynamics and highlighting the need for models encoding biological temporal invariances.
Contribution
It demonstrates that existing video-trained ANNs inadequately model IT's dynamic computations, emphasizing the necessity for new objectives that incorporate biological temporal invariances.
Findings
Video models modestly improve neural predictivity at later stages.
ANN models fail to generalize across appearance-free variants.
Current models mainly capture appearance-bound rather than invariant dynamics.
Abstract
Feedforward artificial neural networks (ANNs) trained on static images remain the dominant models of the the primate ventral visual stream, yet they are intrinsically limited to static computations. The primate world is dynamic, and the macaque ventral visual pathways, specifically the inferior temporal (IT) cortex not only supports object recognition but also encodes object motion velocity during naturalistic video viewing. Does IT's temporal responses reflect nothing more than time-unfolded feedforward transformations, framewise features with shallow temporal pooling, or do they embody richer dynamic computations? We tested this by comparing macaque IT responses during naturalistic videos against static, recurrent, and video-based ANN models. Video models provided modest improvements in neural predictivity, particularly at later response stages, raising the question of what kind of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Action Observation and Synchronization · Visual perception and processing mechanisms
