Exploring Robust Face-Voice Matching in Multilingual Environments

Jiehui Tang; Xiaofei Wang; Zhen Xiao; Jiayi Liu; Xueliang Liu; Richang; Hong

arXiv:2407.19875·cs.CV·July 30, 2024

Exploring Robust Face-Voice Matching in Multilingual Environments

Jiehui Tang, Xiaofei Wang, Zhen Xiao, Jiayi Liu, Xueliang Liu, Richang, Hong

PDF

Open Access

TL;DR

This paper introduces a novel face-voice matching method for multilingual environments, utilizing a dual-branch structure, dynamic weighting, data augmentation, and score polarization to improve accuracy across diverse languages and conditions.

Contribution

It proposes a new face-voice association framework with four key components that enhance robustness and generalization in multilingual face-voice matching tasks.

Findings

01

Achieved EER of 20.07 on V2-EH dataset

02

Achieved EER of 21.76 on V1-EU dataset

03

Demonstrated significant effectiveness of the proposed methods

Abstract

This paper presents Team Xaiofei's innovative approach to exploring Face-Voice Association in Multilingual Environments (FAME) at ACM Multimedia 2024. We focus on the impact of different languages in face-voice matching by building upon Fusion and Orthogonal Projection (FOP), introducing four key components: a dual-branch structure, dynamic sample pair weighting, robust data augmentation, and score polarization strategy. Our dual-branch structure serves as an auxiliary mechanism to better integrate and provide more comprehensive information. We also introduce a dynamic weighting mechanism for various sample pairs to optimize learning. Data augmentation techniques are employed to enhance the model's generalization across diverse conditions. Additionally, score polarization strategy based on age and gender matching confidence clarifies and accentuates the final results. Our methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsFocus