Recurrent Spatial Transformer Networks
S{\o}ren Kaae S{\o}nderby, Casper Kaae S{\o}nderby, Lars, Maal{\o}e, Ole Winther

TL;DR
This paper introduces a recurrent spatial transformer network (RNN-SPN) that improves digit classification in cluttered sequences by adaptively focusing on regions of interest, achieving lower error rates than previous models.
Contribution
The integration of spatial transformer networks into a recurrent framework enables adaptive down-sampling and attention to regions of interest, enhancing classification accuracy.
Findings
RNN-SPN achieves 1.5% error on cluttered MNIST sequences.
The model effectively performs adaptive down-sampling without performance loss.
Superior to convolutional networks in digit classification accuracy.
Abstract
We integrate the recently proposed spatial transformer network (SPN) [Jaderberg et. al 2015] into a recurrent neural network (RNN) to form an RNN-SPN model. We use the RNN-SPN to classify digits in cluttered MNIST sequences. The proposed model achieves a single digit error of 1.5% compared to 2.9% for a convolutional networks and 2.0% for convolutional networks with SPN layers. The SPN outputs a zoomed, rotated and skewed version of the input image. We investigate different down-sampling factors (ratio of pixel in input and output) for the SPN and show that the RNN-SPN model is able to down-sample the input images without deteriorating performance. The down-sampling in RNN-SPN can be thought of as adaptive down-sampling that minimizes the information loss in the regions of interest. We attribute the superior performance of the RNN-SPN to the fact that it can attend to a sequence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Solar Radiation and Photovoltaics · Visual Attention and Saliency Detection
MethodsSpatial Transformer
