PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

Abdul Hannan; Muhammad Arslan Manzoor; Shah Nawaz; Muhammad Irzam Liaqat; Markus Schedl; Mubashir Noman

arXiv:2505.17002·cs.CV·May 29, 2025

PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association

Abdul Hannan, Muhammad Arslan Manzoor, Shah Nawaz, Muhammad Irzam Liaqat, Markus Schedl, Mubashir Noman

PDF

Open Access 1 Repo

TL;DR

This paper introduces PAEFF, a novel method for face-voice association that aligns and fuses embeddings more accurately, leading to improved performance in multimodal matching tasks.

Contribution

The paper proposes a new approach that aligns face and voice embedding spaces before fusion, addressing previous issues with negative mining and margin parameters.

Findings

01

Improved accuracy on VoxCeleb dataset

02

Effective alignment of face and voice embeddings

03

Enhanced gated fusion boosts association performance

Abstract

We study the task of learning association between faces and voices, which is gaining interest in the multimodal community lately. These methods suffer from the deliberate crafting of negative mining procedures as well as the reliance on the distant margin parameter. These issues are addressed by learning a joint embedding space in which orthogonality constraints are applied to the fused embeddings of faces and voices. However, embedding spaces of faces and voices possess different characteristics and require spaces to be aligned before fusing them. To this end, we propose a method that accurately aligns the embedding spaces and fuses them with an enhanced gated fusion thereby improving the performance of face-voice association. Extensive experiments on the VoxCeleb dataset reveals the merits of the proposed approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hannabdul/paeff
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition · Face and Expression Recognition