Recurrent Transformer Networks for Semantic Correspondence

Seungryong Kim; Stephen Lin; Sangryul Jeon; Dongbo Min; and Kwanghoon; Sohn

arXiv:1810.12155·cs.CV·October 30, 2018·50 cites

Recurrent Transformer Networks for Semantic Correspondence

Seungryong Kim, Stephen Lin, Sangryul Jeon, Dongbo Min, and Kwanghoon, Sohn

PDF

Open Access 1 Repo

TL;DR

Recurrent transformer networks (RTNs) iteratively estimate spatial transformations to improve dense semantic correspondence between images, achieving state-of-the-art results through recursive refinement and weakly-supervised training.

Contribution

Introduction of RTNs that directly estimate transformations between image pairs, enhancing accuracy over previous methods, with a novel weakly-supervised training technique.

Findings

01

Achieved state-of-the-art performance on semantic correspondence benchmarks.

02

Demonstrated improved accuracy by directly estimating transformations.

03

Effective weakly-supervised training method for RTNs.

Abstract

We present recurrent transformer networks (RTNs) for obtaining dense correspondences between semantically similar images. Our networks accomplish this through an iterative process of estimating spatial transformations between the input images and using these transformations to generate aligned convolutional activations. By directly estimating the transformations between an image pair, rather than employing spatial transformer networks to independently normalize each individual image, we show that greater accuracy can be achieved. This process is conducted in a recursive manner to refine both the transformation estimates and the feature representations. In addition, a technique is presented for weakly-supervised training of RTNs that is based on a proposed classification loss. With RTNs, state-of-the-art performance is attained on several benchmarks for semantic correspondence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seungryong/RTNs
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Spatial Transformer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam