Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network
Zitang Sun, Yen-Ju Chen, Yung-hao Yang, Shin'ya Nishida

TL;DR
This paper introduces a biologically inspired, trainable model combining motion energy sensing and self-attention to emulate human visual motion processing, outperforming traditional computer vision models on complex natural scenes.
Contribution
It presents a novel two-stage model that integrates biological plausibility with deep learning, capturing key aspects of human motion perception and outperforming existing models on benchmark tests.
Findings
Model responses resemble mammalian neural activity in motion pooling and speed tuning.
The model better predicts human responses than ground truth on the Sintel benchmark.
It outperforms state-of-the-art CV models in complex natural scene motion prediction.
Abstract
Visual motion processing is essential for humans to perceive and interact with dynamic environments. Despite extensive research in cognitive neuroscience, image-computable models that can extract informative motion flow from natural scenes in a manner consistent with human visual processing have yet to be established. Meanwhile, recent advancements in computer vision (CV), propelled by deep learning, have led to significant progress in optical flow estimation, a task closely related to motion perception. Here we propose an image-computable model of human motion perception by bridging the gap between biological and CV models. Specifically, we introduce a novel two-stages approach that combines trainable motion energy sensing with a recurrent self-attention network for adaptive motion integration and segregation. This model architecture aims to capture the computations in V1-MT, the core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual perception and processing mechanisms · Neural dynamics and brain function · Advanced Vision and Imaging
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
