Volumetric Transformer Networks
Seungryong Kim, Sabine S\"usstrunk, Mathieu Salzmann

TL;DR
This paper introduces Volumetric Transformer Networks (VTN), a learnable module that predicts channel-wise warping fields to enhance spatial invariance in CNN features, improving fine-grained recognition and image retrieval.
Contribution
The paper proposes VTN, a novel encoder-decoder module that predicts channel-specific warping fields and models feature dependencies, addressing limitations of uniform warping in CNNs.
Findings
VTN improves accuracy on fine-grained recognition tasks.
VTN enhances instance-level image retrieval performance.
The proposed loss function boosts localization ability.
Abstract
Existing techniques to encode spatial invariance within deep convolutional neural networks (CNNs) apply the same warping field to all the feature channels. This does not account for the fact that the individual feature channels can represent different semantic parts, which can undergo different spatial transformations w.r.t. a canonical configuration. To overcome this limitation, we introduce a learnable module, the volumetric transformer network (VTN), that predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. We design our VTN as an encoder-decoder network, with modules dedicated to letting the information flow across the feature channels, to account for the dependencies between the semantic parts. We further propose a loss function defined between the warped features of pairs of instances, which improves the localization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
