Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model

Yufan Liu; Minglang Qiao; Mai Xu; Bing Li; Weiming Hu; Ali Borji

arXiv:2103.15438·cs.CV·March 30, 2021

Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model

Yufan Liu, Minglang Qiao, Mai Xu, Bing Li, Weiming Hu, Ali Borji

PDF

1 Repo

TL;DR

This paper introduces a multi-modal saliency prediction model for multiple-face videos that incorporates visual, audio, and face information, demonstrating improved accuracy over existing methods and aligning more closely with human attention patterns.

Contribution

The paper presents a novel multi-modal saliency model that integrates visual, audio, and face cues, supported by a large-scale eye-tracking database and outperforming existing methods.

Findings

01

Outperforms 11 state-of-the-art saliency models

02

Aligns closely with human multi-modal attention

03

Validates the influence of audio on visual saliency

Abstract

Recently, video streams have occupied a large proportion of Internet traffic, most of which contain human faces. Hence, it is necessary to predict saliency on multiple-face videos, which can provide attention cues for many content based applications. However, most of multiple-face saliency prediction works only consider visual information and ignore audio, which is not consistent with the naturalistic scenarios. Several behavioral studies have established that sound influences human attention, especially during the speech turn-taking in multiple-face videos. In this paper, we thoroughly investigate such influences by establishing a large-scale eye-tracking database of Multiple-face Video in Visual-Audio condition (MVVA). Inspired by the findings of our investigation, we propose a novel multi-modal video saliency model consisting of three branches: visual, audio and face. The visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MinglangQiao/MVVA-Database
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.