Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval
Jing Yang, Hui Xue, Shipeng Zhu, Pengfei Fang

TL;DR
This paper introduces TPSNet, a novel unsupervised cross-domain image retrieval method that uses dual priors from CLIP-generated text prompts and domain-invariant phase features to enhance semantic guidance and domain alignment.
Contribution
The paper proposes TPSNet, integrating text and phase priors to improve unsupervised cross-domain image retrieval beyond existing pseudo-label based methods.
Findings
TPSNet outperforms state-of-the-art methods on UCDIR benchmarks.
Using CLIP-based class prompts provides more accurate semantic supervision.
Domain-invariant phase features effectively bridge domain gaps while preserving semantics.
Abstract
This paper studies unsupervised cross-domain image retrieval (UCDIR), which aims to retrieve images of the same category across different domains without relying on labeled data. Existing methods typically utilize pseudo-labels, derived from clustering algorithms, as supervisory signals for intra-domain representation learning and cross-domain feature alignment. However, these discrete pseudo-labels often fail to provide accurate and comprehensive semantic guidance. Moreover, the alignment process frequently overlooks the entanglement between domain-specific and semantic information, leading to semantic degradation in the learned representations and ultimately impairing retrieval performance. This paper addresses the limitations by proposing a Text-Phase Synergy Network with Dual Priors(TPSNet). Specifically, we first employ CLIP to generate a set of class-specific prompts per domain,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
