Leveraging Audio Gestalt to Predict Media Memorability
Lorin Sweeney, Graham Healy, Alan F. Smeaton

TL;DR
This paper introduces a multimodal deep learning approach that leverages audio gestalt to predict video memorability, combining visual, semantic, and auditory features to improve prediction accuracy.
Contribution
It proposes a novel use of audio gestalt in a late fusion model for media memorability prediction, integrating multiple modalities for enhanced performance.
Findings
Audio gestalt effectively influences memorability prediction.
Multimodal fusion improves prediction accuracy.
The approach outperforms single-modality models.
Abstract
Memorability determines what evanesces into emptiness, and what worms its way into the deepest furrows of our minds. It is the key to curating more meaningful media content as we wade through daily digital torrents. The Predicting Media Memorability task in MediaEval 2020 aims to address the question of media memorability by setting the task of automatically predicting video memorability. Our approach is a multimodal deep learning-based late fusion that combines visual, semantic, and auditory features. We used audio gestalt to estimate the influence of the audio modality on overall video memorability, and accordingly inform which combination of features would best predict a given video's memorability scores.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Image Enhancement Techniques
