Invariant recognition drives neural representations of action sequences
Andrea Tacchetti, Leyla Isik, Tomaso Poggio

TL;DR
This paper demonstrates that neural representations of action sequences in the human visual cortex are driven by invariant recognition performance, with models supporting invariance aligning more closely with neural data.
Contribution
It shows that improving invariant action recognition in models enhances their similarity to human neural representations of actions.
Findings
Spatiotemporal CNNs effectively categorize actions in videos.
Model modifications that boost invariance improve neural data alignment.
Invariant recognition performance predicts neural representation quality.
Abstract
Recognizing the actions of others from visual stimuli is a crucial aspect of human visual perception that allows individuals to respond to social cues. Humans are able to identify similar behaviors and discriminate between distinct actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding motion perception at the neural level have not always translated in precise accounts of the computational principles underlying what representation our visual cortex evolved or learned to compute. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, CNNs, that achieve human level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Action Observation and Synchronization · Human Pose and Action Recognition
