Exploring Robust Face-Voice Matching in Multilingual Environments
Jiehui Tang, Xiaofei Wang, Zhen Xiao, Jiayi Liu, Xueliang Liu, Richang, Hong

TL;DR
This paper introduces a novel face-voice matching method for multilingual environments, utilizing a dual-branch structure, dynamic weighting, data augmentation, and score polarization to improve accuracy across diverse languages and conditions.
Contribution
It proposes a new face-voice association framework with four key components that enhance robustness and generalization in multilingual face-voice matching tasks.
Findings
Achieved EER of 20.07 on V2-EH dataset
Achieved EER of 21.76 on V1-EU dataset
Demonstrated significant effectiveness of the proposed methods
Abstract
This paper presents Team Xaiofei's innovative approach to exploring Face-Voice Association in Multilingual Environments (FAME) at ACM Multimedia 2024. We focus on the impact of different languages in face-voice matching by building upon Fusion and Orthogonal Projection (FOP), introducing four key components: a dual-branch structure, dynamic sample pair weighting, robust data augmentation, and score polarization strategy. Our dual-branch structure serves as an auxiliary mechanism to better integrate and provide more comprehensive information. We also introduce a dynamic weighting mechanism for various sample pairs to optimize learning. Data augmentation techniques are employed to enhance the model's generalization across diverse conditions. Additionally, score polarization strategy based on age and gender matching confidence clarifies and accentuates the final results. Our methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsFocus
