Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype   Contrast

Boqing Zhu; Kele Xu; Changjian Wang; Zheng Qin; Tao Sun; Huaimin Wang,; Yuxing Peng

arXiv:2204.14057·cs.SD·May 30, 2022

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang,, Yuxing Peng

PDF

Open Access 1 Repo

TL;DR

This paper introduces CMPC, a novel unsupervised learning method that improves voice-face representation by addressing false negatives and weak correlations through semantic clustering and prototype comparison.

Contribution

The paper proposes cross-modal prototype contrastive learning (CMPC), enhancing unsupervised voice-face representation by leveraging semantic clustering and dynamic prototype comparison.

Findings

01

Outperforms state-of-the-art unsupervised methods in voice-face association tasks.

02

Shows significant improvements in low-shot supervision scenarios.

03

Effectively resists false negatives and deviant positives in contrastive learning.

Abstract

We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These methods neglect the semantic content of different videos, introducing false-negative pairs as training noise. Furthermore, the positive pairs are constructed based on the natural correlation between audio clips and visual frames. However, this correlation might be weak or inaccurate in a large amount of real-world data, which leads to deviating positives into the contrastive paradigm. To address these issues, we propose the cross-modal prototype contrastive learning (CMPC), which takes advantage of contrastive methods and resists adverse effects of false negatives and deviate positives. On one hand, CMPC could learn the intra-class invariance by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cocoxili/cmpc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Nasal Surgery and Airway Studies

MethodsContrastive Learning