Attentional Graph Convolutional Network for Structure-aware Audio-Visual   Scene Classification

Liguang Zhou; Yuhongze Zhou; Xiaonan Qi; Junjie Hu; Tin Lun Lam,; Yangsheng Xu

arXiv:2301.00145·cs.CV·January 3, 2023·1 cites

Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification

Liguang Zhou, Yuhongze Zhou, Xiaonan Qi, Junjie Hu, Tin Lun Lam,, Yangsheng Xu

PDF

Open Access

TL;DR

This paper introduces an end-to-end attentional graph convolutional network (AGCN) that captures structure-aware features for audio-visual scene classification, emphasizing salient and semantic regions in both modalities.

Contribution

The paper proposes a novel AGCN framework that constructs and utilizes multiple graphs to represent salient and contextual information in audio-visual data for improved scene recognition.

Findings

01

Achieved promising results on multiple scene recognition datasets.

02

Effectively visualized graphs to highlight salient regions.

03

Demonstrated the importance of structure-aware features in audio-visual understanding.

Abstract

Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in the audio signals and spatial layouts of different objects and various texture patterns in the visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while the learning of explicit semantically relevant frames of sound signals and visual images has been overlooked. To this end, we present an end-to-end framework, namely attentional graph convolutional network (AGCN), for structure-aware audio-visual scene representation. First, the spectrogram of sound and input image is processed by a backbone network for feature extraction. Then, to build multi-scale hierarchical information of input features, we utilize an attention fusion mechanism to aggregate features from multiple layers of the backbone network. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing

MethodsAdaptive Graph Convolutional Neural Networks