Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels

Keisuke Imoto

arXiv:2510.25075·cs.SD·October 30, 2025

Joint Analysis of Acoustic Scenes and Sound Events Based on Semi-Supervised Training of Sound Events With Partial Labels

Keisuke Imoto

PDF

TL;DR

This paper introduces a semi-supervised, multitask learning framework that leverages acoustic scene context and partial labels to improve sound event detection while reducing annotation costs.

Contribution

It proposes a novel joint acoustic scene and sound event analysis method using partial labels and semi-supervised learning, incorporating label refinement via self-distillation.

Findings

01

Improved sound event detection accuracy with reduced annotation effort.

02

Effective use of acoustic scene context to construct partial labels.

03

Enhanced model performance through label refinement techniques.

Abstract

Annotating time boundaries of sound events is labor-intensive, limiting the scalability of strongly supervised learning in audio detection. To reduce annotation costs, weakly-supervised learning with only clip-level labels has been widely adopted. As an alternative, partial label learning offers a cost-effective approach, where a set of possible labels is provided instead of exact weak annotations. However, partial label learning for audio analysis remains largely unexplored. Motivated by the observation that acoustic scenes provide contextual information for constructing a set of possible sound events, we utilize acoustic scene information to construct partial labels of sound events. On the basis of this idea, in this paper, we propose a multitask learning framework that jointly performs acoustic scene classification and sound event detection with partial labels of sound events. While…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.