Look Before You Leap: Improving Text-based Person Retrieval by Learning   A Consistent Cross-modal Common Manifold

Zijie Wang; Aichun Zhu; Jingyi Xue; Xili Wan; Chao Liu; Tian Wang,; Yifeng Li

arXiv:2209.06209·cs.CV·September 15, 2022

Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang,, Yifeng Li

PDF

TL;DR

This paper introduces LBUL, a novel approach for text-based person retrieval that learns a consistent cross-modal common manifold by considering both visual and textual data distributions, leading to improved retrieval accuracy.

Contribution

The paper proposes LBUL, a new algorithm that addresses the CDCP dilemma by incorporating distribution characteristics of both modalities before embedding, enhancing cross-modal alignment.

Findings

01

LBUL outperforms previous methods on CUHK-PEDES and RSTPReid datasets.

02

LBUL achieves state-of-the-art retrieval accuracy.

03

Considering both modalities' distributions improves cross-modal alignment.

Abstract

The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold mapping paradigm following a \textbf{cross-modal distribution consensus prediction (CDCP)} manner. When mapping features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this \textbf{CDCP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN