MPN: Multimodal Parallel Network for Audio-Visual Event Localization
Jiashuo Yu, Ying Cheng, Rui Feng

TL;DR
This paper introduces MPN, a multimodal network that effectively localizes audio-visual events in videos by combining global semantic understanding with detailed local information, achieving state-of-the-art results.
Contribution
The paper proposes a novel Multimodal Parallel Network (MPN) with specialized modules for global and local feature extraction, advancing audio-visual event localization methods.
Findings
Achieves state-of-the-art performance on AVE dataset.
Effective in both fully supervised and weakly supervised settings.
Demonstrates superior localization accuracy and event classification.
Abstract
Audio-visual event localization aims to localize an event that is both audible and visible in the wild, which is a widespread audio-visual scene analysis task for unconstrained videos. To address this task, we propose a Multimodal Parallel Network (MPN), which can perceive global semantics and unmixed local information parallelly. Specifically, our MPN framework consists of a classification subnetwork to predict event categories and a localization subnetwork to predict event boundaries. The classification subnetwork is constructed by the Multimodal Co-attention Module (MCM) and obtains global contexts. The localization subnetwork consists of Multimodal Bottleneck Attention Module (MBAM), which is designed to extract fine-grained segment-level contents. Extensive experiments demonstrate that our framework achieves the state-of-the-art performance both in fully supervised and weakly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization
MethodsMatrix-power Normalization
