Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval

Shuyu Yang; Yaxiong Wang; Yongrui Li; Li Zhu; Zhedong Zheng

arXiv:2507.10195·cs.CV·March 31, 2026

Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval

Shuyu Yang, Yaxiong Wang, Yongrui Li, Li Zhu, Zhedong Zheng

PDF

1 Repo

TL;DR

This paper introduces a dual-level domain adaptation pipeline for text-based person retrieval, effectively bridging the domain gap between synthetic and real data to improve retrieval accuracy.

Contribution

It proposes a novel unified approach combining image-level and region-level domain adaptation techniques, achieving state-of-the-art results.

Findings

01

Achieved state-of-the-art performance on multiple datasets.

02

Effectively aligned synthetic and real-world image distributions.

03

Improved region-to-sentence correspondence in retrieval tasks.

Abstract

In this work, we focus on text-based person retrieval, which identifies individuals based on textual descriptions. Despite advancements enabled by synthetic data for pretraining, a significant domain gap, due to variations in lighting, color, and viewpoint, limits the effectiveness of the pretrain-finetune paradigm. To overcome this issue, we propose a unified pipeline incorporating domain adaptation at both image and region levels. Our method features two key components: Domain-aware Diffusion (DaD) for image-level adaptation, which aligns image distributions between synthetic and real-world domains, e.g., CUHK-PEDES, and Multi-granularity Relation Alignment (MRA) for region-level adaptation, which aligns visual regions with descriptive sentences, thereby addressing disparities at a finer granularity. This dual-level strategy effectively bridges the domain gap, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Shuyu-XJTU/MRA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.