Recurrent Spatial Transformer Networks

S{\o}ren Kaae S{\o}nderby; Casper Kaae S{\o}nderby; Lars; Maal{\o}e; Ole Winther

arXiv:1509.05329·cs.CV·September 18, 2015·37 cites

Recurrent Spatial Transformer Networks

S{\o}ren Kaae S{\o}nderby, Casper Kaae S{\o}nderby, Lars, Maal{\o}e, Ole Winther

PDF

Open Access 2 Repos

TL;DR

This paper introduces a recurrent spatial transformer network (RNN-SPN) that improves digit classification in cluttered sequences by adaptively focusing on regions of interest, achieving lower error rates than previous models.

Contribution

The integration of spatial transformer networks into a recurrent framework enables adaptive down-sampling and attention to regions of interest, enhancing classification accuracy.

Findings

01

RNN-SPN achieves 1.5% error on cluttered MNIST sequences.

02

The model effectively performs adaptive down-sampling without performance loss.

03

Superior to convolutional networks in digit classification accuracy.

Abstract

We integrate the recently proposed spatial transformer network (SPN) [Jaderberg et. al 2015] into a recurrent neural network (RNN) to form an RNN-SPN model. We use the RNN-SPN to classify digits in cluttered MNIST sequences. The proposed model achieves a single digit error of 1.5% compared to 2.9% for a convolutional networks and 2.0% for convolutional networks with SPN layers. The SPN outputs a zoomed, rotated and skewed version of the input image. We investigate different down-sampling factors (ratio of pixel in input and output) for the SPN and show that the RNN-SPN model is able to down-sample the input images without deteriorating performance. The down-sampling in RNN-SPN can be thought of as adaptive down-sampling that minimizes the information loss in the regions of interest. We attribute the superior performance of the RNN-SPN to the fact that it can attend to a sequence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Solar Radiation and Photovoltaics · Visual Attention and Saliency Detection

MethodsSpatial Transformer