Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration
Yanxiong Li, Mingle Liu, Wucheng Wang, Yuhan Zhang, Qianhua He

TL;DR
This paper introduces a novel acoustic scene clustering method that jointly optimizes deep feature learning and clustering, achieving superior results over existing unsupervised approaches.
Contribution
It proposes a unified framework combining deep CNN-based feature extraction with hierarchical clustering, optimized through a joint loss function.
Findings
Outperforms other unsupervised methods in clustering accuracy
Deep embedding features outperform state-of-the-art features
Unified optimization improves clustering performance
Abstract
Recent efforts have been made on acoustic scene classification in the audio signal processing community. In contrast, few studies have been conducted on acoustic scene clustering, which is a newly emerging problem. Acoustic scene clustering aims at merging the audio recordings of the same class of acoustic scene into a single cluster without using prior information and training classifiers. In this study, we propose a method for acoustic scene clustering that jointly optimizes the procedures of feature learning and clustering iteration. In the proposed method, the learned feature is a deep embedding that is extracted from a deep convolutional neural network (CNN), while the clustering algorithm is the agglomerative hierarchical clustering (AHC). We formulate a unified loss function for integrating and optimizing these two procedures. Various features and methods are compared. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
