RFOP: Rethinking Fusion and Orthogonal Projection for Face-Voice Association
Abdul Hannan, Furqan Malik, Hina Jabbar, Syed Suleman Sadiq, Mubashir Noman

TL;DR
This paper presents RFOP, a novel approach for face-voice association in multilingual environments, emphasizing fusion and orthogonal projection to improve semantic relevance, achieving competitive results in the FAME 2026 challenge.
Contribution
The paper introduces RFOP, a new method that enhances face-voice association by focusing on semantic information through fusion and orthogonal projection techniques.
Findings
Achieved 33.1% EER on English-German face-voice data
Ranked 3rd in the FAME 2026 challenge
Effective focus on relevant semantic information improves performance
Abstract
Face-voice association in multilingual environment challenge 2026 aims to investigate the face-voice association task in multilingual scenario. The challenge introduces English-German face-voice pairs to be utilized in the evaluation phase. To this end, we revisit the fusion and orthogonal projection for face-voice association by effectively focusing on the relevant semantic information within the two modalities. Our method performs favorably on the English-German data split and ranked 3rd in the FAME 2026 challenge by achieving the EER of 33.1.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Emotion and Mood Recognition · Face Recognition and Perception
