PhD Thesis: Exploring the role of (self-)attention in cognitive and   computer vision architecture

Mohit Vaishnav

arXiv:2306.14650·cs.AI·June 29, 2023

PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture

Mohit Vaishnav

PDF

Open Access

TL;DR

This thesis explores how self-attention and memory mechanisms improve reasoning in vision tasks, proposing new models and architectures that enhance performance, efficiency, and generalization in complex visual reasoning.

Contribution

It introduces a refined taxonomy of reasoning tasks, extends Transformer models with memory, and proposes GAMR, a cognitive architecture that outperforms existing models in multiple aspects.

Findings

01

Self-attention enhances feature extraction in visual reasoning.

02

GAMR outperforms other models in sample efficiency and robustness.

03

Zero-shot generalization achieved on new reasoning tasks.

Abstract

We investigate the role of attention and memory in complex reasoning tasks. We analyze Transformer-based self-attention as a model and extend it with memory. By studying a synthetic visual reasoning test, we refine the taxonomy of reasoning tasks. Incorporating self-attention with ResNet50, we enhance feature maps using feature-based and spatial attention, achieving efficient solving of challenging visual reasoning tasks. Our findings contribute to understanding the attentional needs of SVRT tasks. Additionally, we propose GAMR, a cognitive architecture combining attention and memory, inspired by active vision theory. GAMR outperforms other architectures in sample efficiency, robustness, and compositionality, and shows zero-shot generalization on new reasoning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection