Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision
Haiguang Wen, Junxing Shi, Yizhen Zhang, Kun-Han Lu, Jiayue Cao,, Zhongming Liu

TL;DR
This paper demonstrates that deep convolutional neural networks can predict and decode human brain responses to natural movies, revealing detailed visual and semantic representations across cortical areas.
Contribution
It extends CNN-based encoding and decoding models to dynamic natural vision, showing their effectiveness in predicting and interpreting brain activity during movie viewing.
Findings
CNN predicts responses in ventral and dorsal streams.
Single-voxel responses linked to specific pixel patterns.
fMRI signals decoded for visual reconstruction and semantic categorization.
Abstract
Convolutional neural network (CNN) driven by image recognition has been shown to be able to explain cortical responses to static pictures at ventral-stream areas. Here, we further showed that such CNN could reliably predict and decode functional magnetic resonance imaging data from humans watching natural movies, despite its lack of any mechanism to account for temporal dynamics or feedback processing. Using separate data, encoding and decoding models were developed and evaluated for describing the bi-directional relationships be-tween the CNN and the brain. Through the encoding models, the CNN-predicted areas covered not only the ventral stream, but also the dorsal stream, albe-it to a lesser degree; single-voxel response was visualized as the specific pixel pattern that drove the response, revealing the distinct representation of individual cortical location; cortical activation was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
