Reconstruction-guided attention improves the robustness and shape processing of neural networks
Seoyoung Ahn, Hossein Adeli, Gregory J. Zelinsky

TL;DR
This paper introduces a reconstruction-guided attention model that enhances neural network robustness and shape processing, especially under challenging image corruptions, by integrating top-down feedback for improved object recognition.
Contribution
It presents an iterative encoder-decoder network utilizing reconstruction-based feedback as attention, demonstrating superior robustness and interpretability in out-of-distribution digit recognition tasks.
Findings
Outperforms other models on MNIST-C with various corruptions
Shows robustness to blur, noise, and occlusion
Reveals roles of spatial and feature-based attention in recognition
Abstract
Many visual phenomena suggest that humans use top-down generative or reconstructive processes to create visual percepts (e.g., imagery, object completion, pareidolia), but little is known about the role reconstruction plays in robust object recognition. We built an iterative encoder-decoder network that generates an object reconstruction and used it as top-down attentional feedback to route the most relevant spatial and feature information to feed-forward object recognition processes. We tested this model using the challenging out-of-distribution digit recognition dataset, MNIST-C, where 15 different types of transformation and corruption are applied to handwritten digit images. Our model showed strong generalization performance against various image perturbations, on average outperforming all other models including feedforward CNNs and adversarially trained networks. Our model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Cell Image Analysis Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
