PC$^2$: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval
Yue Duan, Zhangxuan Gu, Zhenzhe Ying, Lei Qi, Changhua Meng, Yinghuan, Shi

TL;DR
This paper introduces PC$^2$, a novel framework for cross-modal retrieval that uses pseudo-classification and pseudo-captioning to handle noisy data pairs, improving robustness and semantic understanding in noisy correspondence learning.
Contribution
The paper proposes a new PC$^2$ framework combining pseudo-classification and pseudo-captioning to address noisy correspondence in cross-modal retrieval, along with a new real-world NCL dataset called NoW.
Findings
PC$^2$ outperforms existing methods on simulated NCL datasets.
The NoW dataset provides a realistic benchmark for NCL.
Empirical results demonstrate significant improvements in retrieval accuracy.
Abstract
In the realm of cross-modal retrieval, seamlessly integrating diverse modalities within multimedia remains a formidable challenge, especially given the complexities introduced by noisy correspondence learning (NCL). Such noise often stems from mismatched data pairs, which is a significant obstacle distinct from traditional noisy labels. This paper introduces Pseudo-Classification based Pseudo-Captioning (PC) framework to address this challenge. PC offers a threefold strategy: firstly, it establishes an auxiliary "pseudo-classification" task that interprets captions as categorical labels, steering the model to learn image-text semantic similarity through a non-contrastive mechanism. Secondly, unlike prevailing margin-based techniques, capitalizing on PC's pseudo-classification capability, we generate pseudo-captions to provide more informative and tangible supervision for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
