Solution for Temporal Sound Localisation Task of ECCV Second Perception   Test Challenge 2024

Haowei Gu; Weihao Zhu; Yang Yang

arXiv:2409.19595·cs.SD·October 1, 2024

Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024

Haowei Gu, Weihao Zhu, Yang Yang

PDF

Open Access

TL;DR

This paper presents an improved method for temporal sound localization that emphasizes sound features over video, utilizing multiple models to enhance audio feature extraction, achieving top performance in the ECCV 2024 challenge.

Contribution

The paper introduces a novel approach that prioritizes sound features and employs various models for audio feature extraction, leading to superior results in the TSL task.

Findings

01

Sound features are more effective than video features for TSL.

02

Using multiple audio models improves localization accuracy.

03

Achieved first place in the ECCV 2024 TSL challenge.

Abstract

This report proposes an improved method for the Temporal Sound Localisation (TSL) task, which localizes and classifies the sound events occurring in the video according to a predefined set of sound classes. The champion solution from last year's first competition has explored the TSL by fusing audio and video modalities with the same weight. Considering the TSL task aims to localize sound events, we conduct relevant experiments that demonstrated the superiority of sound features (Section 3). Based on our findings, to enhance audio modality features, we employ various models to extract audio features, such as InterVideo, CaVMAE, and VideoMAE models. Our approach ranks first in the final test with a score of 0.4925.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing

MethodsSparse Evolutionary Training