Recurrent Transformer Networks for Semantic Correspondence
Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, and Kwanghoon, Sohn

TL;DR
Recurrent transformer networks (RTNs) iteratively estimate spatial transformations to improve dense semantic correspondence between images, achieving state-of-the-art results through recursive refinement and weakly-supervised training.
Contribution
Introduction of RTNs that directly estimate transformations between image pairs, enhancing accuracy over previous methods, with a novel weakly-supervised training technique.
Findings
Achieved state-of-the-art performance on semantic correspondence benchmarks.
Demonstrated improved accuracy by directly estimating transformations.
Effective weakly-supervised training method for RTNs.
Abstract
We present recurrent transformer networks (RTNs) for obtaining dense correspondences between semantically similar images. Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations. By directly estimating the transformations between an image pair, rather than employing spatial transformer networks to independently normalize each individual image, we show that greater accuracy can be achieved. This process is conducted in a recursive manner to refine both the transformation estimates and the feature representations. In addition, a technique is presented for weakly-supervised training of RTNs that is based on a proposed classification loss. With RTNs, state-of-the-art performance is attained on several benchmarks for semantic correspondence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Spatial Transformer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam
