BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification
Takuro Fujii, Shuhei Tarashima

TL;DR
This paper introduces BiLMa, a bidirectional local-matching framework for text-based person re-identification that jointly optimizes image and text alignment, significantly improving retrieval accuracy.
Contribution
The paper proposes a novel bidirectional local-matching framework with Semantic MIM, enhancing image-text alignment in TBPReID beyond uni-directional methods.
Findings
Achieves state-of-the-art Rank@1 scores on three benchmarks.
Improves mAP scores significantly over previous methods.
Demonstrates effectiveness of bidirectional local-matching and Semantic MIM.
Abstract
Text-based person re-identification (TBPReID) aims to retrieve person images represented by a given textual query. In this task, how to effectively align images and texts globally and locally is a crucial challenge. Recent works have obtained high performances by solving Masked Language Modeling (MLM) to align image/text parts. However, they only performed uni-directional (i.e., from image to text) local-matching, leaving room for improvement by introducing opposite-directional (i.e., from text to image) local-matching. In this work, we introduce Bidirectional Local-Matching (BiLMa) framework that jointly optimize MLM and Masked Image Modeling (MIM) in TBPReID model training. With this framework, our model is trained so as the labels of randomly masked both image and text tokens are predicted by unmasked tokens. In addition, to narrow the semantic gap between image and text in MIM, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Vehicle License Plate Recognition
MethodsMutual Information Machine/Mask Image Modeling · ALIGN
