Music and Vocal Separation Using Multi-Band Modulation Based Features
Sunil Kumar Kopparapu, Meghna Pandharipande, G Sita

TL;DR
This paper explores the use of non-linear modulation features derived from Teager-Kaiser energy operator for music and vocal separation, demonstrating their discriminative power in specific frequency bands.
Contribution
It introduces a novel approach using non-linear modulation features and energy separation for music and voice discrimination in audio signals.
Findings
Discriminative features are found in low and mid frequency bands (200-1500 Hz).
Non-linear features outperform traditional features in certain frequency ranges.
The method effectively distinguishes music from voice in Indian classical songs.
Abstract
The potential use of non-linear speech features has not been investigated for music analysis although other commonly used speech features like Mel Frequency Ceptral Coefficients (MFCC) and pitch have been used extensively. In this paper, we assume an audio signal to be a sum of modulated sinusoidal and then use the energy separation algorithm to decompose the audio into amplitude and frequency modulation components using the non-linear Teager-Kaiser energy operator. We first identify the distribution of these non-linear features for music only and voice only segments in the audio signal in different Mel spaced frequency bands and show that they have the ability to discriminate. The proposed method based on Kullback-Leibler divergence measure is evaluated using a set of Indian classical songs from three different artists. Experimental results show that the discrimination ability is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
