Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes

Masahiro Yasuda; Binh Thien Nguyen; Noboru Harada; Romain Serizel; Mayank Mishra; Marc Delcroix; Shoko Araki; Daiki Takeuchi; Daisuke Niizumi; Yasunori Ohishi; Tomohiro Nakatani; Takao Kawamura; Nobutaka Ono

arXiv:2506.10676·cs.SD·June 13, 2025

Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes

Masahiro Yasuda, Binh Thien Nguyen, Noboru Harada, Romain Serizel, Mayank Mishra, Marc Delcroix, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Yasunori Ohishi, Tomohiro Nakatani, Takao Kawamura, Nobutaka Ono

PDF

Open Access 1 Repo

TL;DR

This paper introduces the DCASE 2025 Challenge Task 4 focused on spatial semantic segmentation of sound scenes, aiming to improve sound event detection and separation using multi-channel spatial audio data.

Contribution

It defines the S5 task for DCASE 2025, presents a new dataset, and reports initial experimental results, advancing research in spatial sound scene analysis.

Findings

01

Initial experimental results demonstrate the feasibility of the S5 approach.

02

The newly curated dataset supports development of spatial sound scene analysis.

03

The task sets a benchmark for future research in spatial semantic segmentation.

Abstract

Spatial Semantic Segmentation of Sound Scenes (S5) aims to enhance technologies for sound event detection and separation from multi-channel input signals that mix multiple sound events with spatial information. This is a fundamental basis of immersive communication. The ultimate goal is to separate sound event signals with 6 Degrees of Freedom (6DoF) information into dry sound object signals and metadata about the object type (sound event class) and representing spatial information, including direction. However, because several existing challenge tasks already provide some of the subset functions, this task for this year focuses on detecting and separating sound events from multi-channel spatial input signals. This paper outlines the S5 task setting of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2025 Challenge Task 4 and the DCASE2025 Task 4 Dataset, newly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nttcslab/dcase2025_task4_baseline
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies