A Reinforcement Learning Approach for Sequential Spatial Transformer Networks
Fatemeh Azimi, Federico Raue, Joern Hees, Andreas Dengel

TL;DR
This paper introduces a reinforcement learning-based method for sequential spatial transformations in neural networks, allowing direct optimization of classification accuracy and improving performance on cluttered image datasets.
Contribution
It combines Spatial Transformer Networks with reinforcement learning, enabling non-differentiable transformations and flexible objective functions for better image classification.
Findings
Outperforms traditional STN on cluttered MNIST and Fashion-MNIST datasets.
Allows direct optimization of accuracy rather than just minimizing error.
Demonstrates effectiveness of RL in learning spatial transformations.
Abstract
Spatial Transformer Networks (STN) can generate geometric transformations which modify input images to improve the classifier's performance. In this work, we combine the idea of STN with Reinforcement Learning (RL). To this end, we break the affine transformation down into a sequence of simple and discrete transformations. We formulate the task as a Markovian Decision Process (MDP) and use RL to solve this sequential decision-making problem. STN architectures learn the transformation parameters by minimizing the classification error and backpropagating the gradients through a sub-differentiable sampling module. In our method, we are not bound to the differentiability of the sampling modules. Moreover, we have freedom in designing the objective rather than only minimizing the error; e.g., we can directly set the target as maximizing the accuracy. We design multiple experiments to verify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Dropout · Layer Normalization · Label Smoothing
