A Reinforcement Learning Approach for Sequential Spatial Transformer   Networks

Fatemeh Azimi; Federico Raue; Joern Hees; Andreas Dengel

arXiv:2106.14295·cs.LG·June 29, 2021

A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

Fatemeh Azimi, Federico Raue, Joern Hees, Andreas Dengel

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based method for sequential spatial transformations in neural networks, allowing direct optimization of classification accuracy and improving performance on cluttered image datasets.

Contribution

It combines Spatial Transformer Networks with reinforcement learning, enabling non-differentiable transformations and flexible objective functions for better image classification.

Findings

01

Outperforms traditional STN on cluttered MNIST and Fashion-MNIST datasets.

02

Allows direct optimization of accuracy rather than just minimizing error.

03

Demonstrates effectiveness of RL in learning spatial transformations.

Abstract

Spatial Transformer Networks (STN) can generate geometric transformations which modify input images to improve the classifier's performance. In this work, we combine the idea of STN with Reinforcement Learning (RL). To this end, we break the affine transformation down into a sequence of simple and discrete transformations. We formulate the task as a Markovian Decision Process (MDP) and use RL to solve this sequential decision-making problem. STN architectures learn the transformation parameters by minimizing the classification error and backpropagating the gradients through a sub-differentiable sampling module. In our method, we are not bound to the differentiability of the sampling modules. Moreover, we have freedom in designing the objective rather than only minimizing the error; e.g., we can directly set the target as maximizing the accuracy. We design multiple experiments to verify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Dropout · Layer Normalization · Label Smoothing