AAformer: Auto-Aligned Transformer for Person Re-Identification

Kuan Zhu; Haiyun Guo; Shiliang Zhang; Yaowei Wang; Jing Liu; Jinqiao; Wang; Ming Tang

arXiv:2104.00921·cs.CV·June 26, 2024

AAformer: Auto-Aligned Transformer for Person Re-Identification

Kuan Zhu, Haiyun Guo, Shiliang Zhang, Yaowei Wang, Jing Liu, Jinqiao, Wang, Ming Tang

PDF

TL;DR

The paper introduces AAformer, a transformer-based model that automatically locates and extracts fine-grained part and nonpart features for person re-identification, outperforming existing CNN-based methods.

Contribution

It proposes the auto-aligned transformer (AAformer) with learnable part tokens and an auto-alignment mechanism using optimal transport for precise part localization.

Findings

01

AAformer outperforms state-of-the-art methods in person re-ID.

02

The part tokens effectively capture fine-grained features.

03

Auto-alignment improves patch grouping accuracy.

Abstract

In person re-identification (re-ID), extracting part-level features from person images has been verified to be crucial to offer fine-grained information. Most of the existing CNN-based methods only locate the human parts coarsely, or rely on pretrained human parsing models and fail in locating the identifiable nonhuman parts (e.g., knapsack). In this article, we introduce an alignment scheme in transformer architecture for the first time and propose the auto-aligned transformer (AAformer) to automatically locate both the human parts and nonhuman ones at patch level. We introduce the "Part tokens ([PART]s)", which are learnable vectors, to extract part features in the transformer. A [PART] only interacts with a local subset of patches in self-attention and learns to be the part representation. To adaptively group the image patches into different subsets, we design the auto-alignment.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Dropout · Byte Pair Encoding · Residual Connection · Layer Normalization · Label Smoothing · Adam