The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System
Lorin Sweeney, Graham Healy, Alan F. Smeaton

TL;DR
This paper investigates how audio influences video recognition memorability, introducing a multimodal deep learning system that uses audio gestalt features to improve memorability prediction, achieving top-tier results on a benchmark dataset.
Contribution
It presents a novel multimodal deep learning system that leverages audio gestalt features to enhance video memorability prediction.
Findings
Audio significantly influences video recognition memorability.
The proposed system outperforms existing methods, achieving top-2 results on Memento10k.
Audio gestalt features effectively capture high-level audio cues that impact memorability.
Abstract
Memories are the tethering threads that tie us to the world, and memorability is the measure of their tensile strength. The threads of memory are spun from fibres of many modalities, obscuring the contribution of a single fibre to a thread's overall tensile strength. Unfurling these fibres is the key to understanding the nature of their interaction, and how we can ultimately create more meaningful media content. In this paper, we examine the influence of audio on video recognition memorability, finding evidence to suggest that it can facilitate overall video recognition memorability rich in high-level (gestalt) audio features. We introduce a novel multimodal deep learning-based late-fusion system that uses audio gestalt to estimate the influence of a given video's audio on its overall short-term recognition memorability, and selectively leverages audio features to make a prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
