A dataset for Audio-Visual Sound Event Detection in Movies
Rajat Hebbar, Digbalay Bose, Krishna Somandepalli, Veena Vijai,, Shrikanth Narayanan

TL;DR
This paper introduces SAM-S, a large-scale dataset of over 110,000 automatically mined audio events from movies, categorized into 245 sounds, to advance audio-visual sound event detection research.
Contribution
The work presents a novel, automatically annotated dataset from movies using subtitles, along with a taxonomy of sounds and baseline performance benchmarks.
Findings
Baseline audio-only classification achieves 34.76% mAP.
Incorporating visual data improves performance by approximately 5%.
The dataset enables research on audio-visual sound event detection in complex, real-world scenarios.
Abstract
Audio event detection is a widely studied audio processing task, with applications ranging from self-driving cars to healthcare. In-the-wild datasets such as Audioset have propelled research in this field. However, many efforts typically involve manual annotation and verification, which is expensive to perform at scale. Movies depict various real-life and fictional scenarios which makes them a rich resource for mining a wide-range of audio events. In this work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds (SAM-S). We use publicly-available closed-caption transcripts to automatically mine over 110K audio events from 430 movies. We identify three dimensions to categorize audio events: sound, source, quality, and present the steps involved to produce a final taxonomy of 245 sounds. We discuss the choices involved in generating the taxonomy, and also highlight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
