Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark
Shuyu Yang, Yinan Zhou, Yaxiong Wang, Yujiao Wu, Li Zhu, Zhedong Zheng

TL;DR
This paper introduces MALS, a large-scale multi-attribute and language search dataset for text-based person retrieval, and proposes APTM, a joint learning framework that leverages generated data to improve retrieval accuracy.
Contribution
The paper presents MALS, a novel large-scale dataset generated using diffusion models, and introduces APTM, a new joint attribute prompt and text matching learning framework for improved person retrieval.
Findings
MALS contains 1,510,330 image-text pairs, significantly larger than previous datasets.
APTM achieves state-of-the-art results on three benchmarks with substantial accuracy improvements.
Pre-training on MALS with APTM enhances fine-grained person retrieval performance.
Abstract
In this paper, we introduce a large Multi-Attribute and Language Search dataset for text-based person retrieval, called MALS, and explore the feasibility of performing pre-training on both attribute recognition and image-text matching tasks in one stone. In particular, MALS contains 1,510,330 image-text pairs, which is about 37.5 times larger than prevailing CUHK-PEDES, and all images are annotated with 27 attributes. Considering the privacy concerns and annotation costs, we leverage the off-the-shelf diffusion models to generate the dataset. To verify the feasibility of learning from the generated data, we develop a new joint Attribute Prompt Learning and Text Matching Learning (APTM) framework, considering the shared knowledge between attribute and text. As the name implies, APTM contains an attribute prompt learning stream and a text matching learning stream. (1) The attribute prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Mobility and Location-Based Analysis · Data-Driven Disease Surveillance
MethodsDiffusion
