Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions
Michael Joannou, Pia Rotshtein, Uta Noppeney

TL;DR
This paper introduces AVMIT, a large-scale annotated audiovisual dataset with feature embeddings, demonstrating improved audiovisual event recognition performance using RNNs trained on audiovisual-specific data.
Contribution
The paper presents AVMIT, a new annotated dataset with feature embeddings, and shows that training RNNs on audiovisual-specific data enhances event recognition accuracy.
Findings
Training RNNs on AVMIT data improves accuracy by up to 5.94%.
Audiovisual-specific training outperforms modality-agnostic approaches.
AVMIT provides a valuable resource for audiovisual research.
Abstract
We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization · Cinema and Media Studies
