Audio segmentation based on melodic style with hand-crafted features and with convolutional neural networks
Amruta Vidwans, Nachiket Deo, Preeti Rao

TL;DR
This paper explores automatic segmentation of Hindustani Khayal taan sections using hand-crafted features and CNNs, demonstrating high accuracy and insights into feature learning from audio spectrograms.
Contribution
It introduces specific high-level features for taan detection and compares their effectiveness with CNN-based methods on polyphonic vocal recordings.
Findings
High accuracy in taan segmentation using hand-crafted features.
CNNs can learn discriminative features but currently underperform compared to specialized features.
Insights into feature learning from spectrograms for melodic style detection.
Abstract
We investigate methods for the automatic labeling of the taan section, a prominent structural component of the Hindustani Khayal vocal concert. The taan contains improvised raga-based melody rendered in the highly distinctive style of rapid pitch and energy modulations of the voice. We propose computational features that capture these specific high-level characteristics of the singing voice in the polyphonic context. The extracted local features are used to achieve classification at the frame level via a trained multilayer perceptron (MLP) network, followed by grouping and segmentation based on novelty detection. We report high accuracies with reference to musician annotated taan sections across artists and concerts. We also compare the performance obtained by the compact specialized features with frame-level classification via a convolutional neural network (CNN) operating directly on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
