Leveraging Audio Gestalt to Predict Media Memorability

Lorin Sweeney; Graham Healy; Alan F. Smeaton

arXiv:2012.15635·cs.MM·January 1, 2021

Leveraging Audio Gestalt to Predict Media Memorability

Lorin Sweeney, Graham Healy, Alan F. Smeaton

PDF

Open Access

TL;DR

This paper introduces a multimodal deep learning approach that leverages audio gestalt to predict video memorability, combining visual, semantic, and auditory features to improve prediction accuracy.

Contribution

It proposes a novel use of audio gestalt in a late fusion model for media memorability prediction, integrating multiple modalities for enhanced performance.

Findings

01

Audio gestalt effectively influences memorability prediction.

02

Multimodal fusion improves prediction accuracy.

03

The approach outperforms single-modality models.

Abstract

Memorability determines what evanesces into emptiness, and what worms its way into the deepest furrows of our minds. It is the key to curating more meaningful media content as we wade through daily digital torrents. The Predicting Media Memorability task in MediaEval 2020 aims to address the question of media memorability by setting the task of automatically predicting video memorability. Our approach is a multimodal deep learning-based late fusion that combines visual, semantic, and auditory features. We used audio gestalt to estimate the influence of the audio modality on overall video memorability, and accordingly inform which combination of features would best predict a given video's memorability scores.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Image Enhancement Techniques