TL;DR
This paper introduces a dual-level domain adaptation pipeline for text-based person retrieval, effectively bridging the domain gap between synthetic and real data to improve retrieval accuracy.
Contribution
It proposes a novel unified approach combining image-level and region-level domain adaptation techniques, achieving state-of-the-art results.
Findings
Achieved state-of-the-art performance on multiple datasets.
Effectively aligned synthetic and real-world image distributions.
Improved region-to-sentence correspondence in retrieval tasks.
Abstract
In this work, we focus on text-based person retrieval, which identifies individuals based on textual descriptions. Despite advancements enabled by synthetic data for pretraining, a significant domain gap, due to variations in lighting, color, and viewpoint, limits the effectiveness of the pretrain-finetune paradigm. To overcome this issue, we propose a unified pipeline incorporating domain adaptation at both image and region levels. Our method features two key components: Domain-aware Diffusion (DaD) for image-level adaptation, which aligns image distributions between synthetic and real-world domains, e.g., CUHK-PEDES, and Multi-granularity Relation Alignment (MRA) for region-level adaptation, which aligns visual regions with descriptive sentences, thereby addressing disparities at a finer granularity. This dual-level strategy effectively bridges the domain gap, achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
