TransMatcher: Deep Image Matching Through Transformers for Generalizable   Person Re-identification

Shengcai Liao; Ling Shao

arXiv:2105.14432·cs.CV·December 8, 2021·38 cites

TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification

Shengcai Liao, Ling Shao

PDF

Open Access 2 Repos 1 Video

TL;DR

TransMatcher introduces a novel Transformer-based approach for person re-identification, emphasizing a simplified decoder that enhances matching performance and generalizability across datasets.

Contribution

The paper proposes a new simplified decoder for Transformers tailored for image matching, significantly improving generalizable person re-identification performance.

Findings

01

Achieves state-of-the-art results with up to 6.1% Rank-1 improvement

02

Demonstrates the effectiveness of the simplified decoder for image matching

03

Shows better generalization across multiple datasets

Abstract

Transformers have recently gained increasing attention in computer vision. However, existing studies mostly use Transformers for feature representation learning, e.g. for image classification and dense predictions, and the generalizability of Transformers is unknown. In this work, we further investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. Thus, we further design two naive solutions, i.e. query-gallery concatenation in ViT, and query-gallery cross-attention in the vanilla Transformer. The latter improves the performance, but it is still limited. This implies that the attention mechanism in Transformers is primarily designed for global feature aggregation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

TransMatcher: Deep Image Matching Through Transformers for Generalizable Person Re-identification· slideslive

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Vision Transformer · Max Pooling · Label Smoothing · Layer Normalization · Byte Pair Encoding