Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing, Hongyu Qu, Rui Yan, Xiangbo Shu, Jinhui Tang

TL;DR
This paper introduces LoCo, a novel framework for dense audio-visual event localization that leverages local temporal coherence to improve cross-modal alignment and event boundary detection.
Contribution
LoCo employs local correspondence feature modulation and adaptive cross-modal interaction to enhance shared semantics and focus attention on relevant event boundaries.
Findings
Outperforms existing DAVE methods in localization accuracy
Effectively filters irrelevant cross-modal signals
Improves focus on local event boundaries
Abstract
Dense-localization Audio-Visual Events (DAVE) aims to identify time boundaries and corresponding categories for events that are both audible and visible in a long video, where events may co-occur and exhibit varying durations. However, complex audio-visual scenes often involve asynchronization between modalities, making accurate localization challenging. Existing DAVE solutions extract audio and visual features through unimodal encoders, and fuse them via dense cross-modal interaction. However, independent unimodal encoding struggles to emphasize shared semantics between modalities without cross-modal guidance, while dense cross-modal attention may over-attend to semantically unrelated audio-visual features. To address these problems, we present LoCo, a Locality-aware cross-modal Correspondence learning framework for DAVE. LoCo leverages the local temporal continuity of audio-visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Subtitles and Audiovisual Media
MethodsSoftmax · Attention Is All You Need · Focus · Lipschitz Constant Constraint
