Automatic Quality Assessment for Audio-Visual Verification Systems. The   LOVe submission to NIST SRE Challenge 2019

Grigory Antipov; Nicolas Gengembre; Olivier Le Blouch; Ga\"el Le Lan

arXiv:2008.05889·eess.AS·August 17, 2020

Automatic Quality Assessment for Audio-Visual Verification Systems. The LOVe submission to NIST SRE Challenge 2019

Grigory Antipov, Nicolas Gengembre, Olivier Le Blouch, Ga\"el Le Lan

PDF

TL;DR

This paper introduces a universal quality assessment model for audio-visual verification systems, improving multimodal fusion by estimating the quality of face and speaker representations, leading to enhanced verification performance.

Contribution

A novel universal quality assessment model for both face and speaker modalities that improves score-level fusion in multimodal biometric verification.

Findings

01

Improved verification accuracy on NIST SRE19 dataset

02

Effective quality estimation for both modalities

03

Enhanced fusion performance

Abstract

Fusion of scores is a cornerstone of multimodal biometric systems composed of independent unimodal parts. In this work, we focus on quality-dependent fusion for speaker-face verification. To this end, we propose a universal model which can be trained for automatic quality assessment of both face and speaker modalities. This model estimates the quality of representations produced by unimodal systems which are then used to enhance the score-level fusion of speaker and face verification modules. We demonstrate the improvements brought by this quality-dependent fusion on the recent NIST SRE19 Audio-Visual Challenge dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.