Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion
Yuanbo Hou, Bo Kang, Dick Botteldooren

TL;DR
This paper introduces a multibranch audio-visual scene classification model that leverages contrastive event-object alignment and semantic fusion to improve scene understanding by exploiting fine-grained cross-modal relationships.
Contribution
It proposes a novel contrastive event-object alignment and semantic-based fusion approach for more detailed and accurate audio-visual scene classification.
Findings
Outperforms single-modality models in AV scene classification
Aligns audio events with visual objects at a fine-grained level
Achieves competitive results without extra datasets or data augmentation
Abstract
Previous works on scene classification are mainly based on audio or visual signals, while humans perceive the environmental scenes through multiple senses. Recent studies on audio-visual scene classification separately fine-tune the largescale audio and image pre-trained models on the target dataset, then either fuse the intermediate representations of the audio model and the visual model, or fuse the coarse-grained decision of both models at the clip level. Such methods ignore the detailed audio events and visual objects in audio-visual scenes (AVS), while humans often identify different scenes through audio events and visual objects within and the congruence between them. To exploit the fine-grained information of audio events and visual objects in AVS, and coordinate the implicit relationship between audio events and visual objects, this paper proposes a multibranch model equipped…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
