BiLMa: Bidirectional Local-Matching for Text-based Person   Re-identification

Takuro Fujii; Shuhei Tarashima

arXiv:2309.04675·cs.CV·September 12, 2023

BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification

Takuro Fujii, Shuhei Tarashima

PDF

Open Access

TL;DR

This paper introduces BiLMa, a bidirectional local-matching framework for text-based person re-identification that jointly optimizes image and text alignment, significantly improving retrieval accuracy.

Contribution

The paper proposes a novel bidirectional local-matching framework with Semantic MIM, enhancing image-text alignment in TBPReID beyond uni-directional methods.

Findings

01

Achieves state-of-the-art Rank@1 scores on three benchmarks.

02

Improves mAP scores significantly over previous methods.

03

Demonstrates effectiveness of bidirectional local-matching and Semantic MIM.

Abstract

Text-based person re-identification (TBPReID) aims to retrieve person images represented by a given textual query. In this task, how to effectively align images and texts globally and locally is a crucial challenge. Recent works have obtained high performances by solving Masked Language Modeling (MLM) to align image/text parts. However, they only performed uni-directional (i.e., from image to text) local-matching, leaving room for improvement by introducing opposite-directional (i.e., from text to image) local-matching. In this work, we introduce Bidirectional Local-Matching (BiLMa) framework that jointly optimize MLM and Masked Image Modeling (MIM) in TBPReID model training. With this framework, our model is trained so as the labels of randomly masked both image and text tokens are predicted by unmasked tokens. In addition, to narrow the semantic gap between image and text in MIM, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Vehicle License Plate Recognition

MethodsMutual Information Machine/Mask Image Modeling · ALIGN