Recurrent Attention Models with Object-centric Capsule Representation   for Multi-object Recognition

Hossein Adeli; Seoyoung Ahn; Gregory Zelinsky

arXiv:2110.04954·cs.CV·October 12, 2021

Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition

Hossein Adeli, Seoyoung Ahn, Gregory Zelinsky

PDF

Open Access 1 Repo

TL;DR

This paper introduces a recurrent attention model using object-centric capsule representations that effectively integrates attention and recognition for multi-object recognition tasks, demonstrating improved performance with minimal supervision.

Contribution

It presents a novel encoder-decoder model combining capsule networks with iterative glimpse attention for integrated object recognition and scene understanding.

Findings

01

Effective recognition of overlapping objects and cluttered scenes

02

Model learns to move its glimpse window adaptively

03

Achieves high accuracy with only classification supervision

Abstract

The visual system processes a scene using a sequence of selective glimpses, each driven by spatial and object-based attention. These glimpses reflect what is relevant to the ongoing task and are selected through recurrent processing and recognition of the objects in the scene. In contrast, most models treat attention selection and recognition as separate stages in a feedforward process. Here we show that using capsule networks to create an object-centric hidden representation in an encoder-decoder model with iterative glimpse attention yields effective integration of attention and recognition. We evaluate our model on three multi-object recognition tasks; highly overlapping digits, digits among distracting clutter and house numbers, and show that it learns to effectively move its glimpse window, recognize and reconstruct the objects, all with only the classification as supervision. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hosseinadeli/ocra
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques