The problems with using STNs to align CNN feature maps
Lukas Finnveden, Ylva Jansson, Tony Lindeberg

TL;DR
This paper critically examines the limitations of Spatial Transformer Networks (STNs) in aligning CNN feature maps, revealing theoretical and practical issues that impact classification accuracy and proposing alternative strategies.
Contribution
The paper provides a theoretical analysis of STNs' inability to align feature maps and suggests sharing parameters between classification and localization networks as a solution.
Findings
STNs cannot generally align feature maps of transformed images.
Using complex features in deeper layers improves classification.
Sharing parameters enhances alignment and accuracy.
Abstract
Spatial transformer networks (STNs) were designed to enable CNNs to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image and its original. We present a theoretical argument for this and investigate the practical implications, showing that this inability is coupled with decreased classification accuracy. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
