Automatic Transcription of Drum Strokes in Carnatic Music
Kausthubh Chandramouli, William Sethares

TL;DR
This paper introduces a neural network-based method for automatically transcribing mridangam strokes in Carnatic music, achieving over 83% accuracy by combining onset detection, spectral features, and tonic invariance techniques.
Contribution
The paper presents a novel automatic transcription algorithm specifically designed for mridangam strokes, incorporating tonic invariance through data augmentation.
Findings
Achieved over 83% accuracy on test data.
Effective onset detection and spectral feature extraction.
Tonic invariance improves classification robustness.
Abstract
The mridangam is a double-headed percussion instrument that plays a key role in Carnatic music concerts. This paper presents a novel automatic transcription algorithm to classify the strokes played on the mridangam. Onset detection is first performed to segment the audio signal into individual strokes, and feature vectors consisting of the DFT magnitude spectrum of the segmented signal are generated. A multi-layer feedforward neural network is trained using the feature vectors as inputs and the manual transcriptions as targets. Since the mridangam is a tonal instrument tuned to a given tonic, tonic invariance is an important feature of the classifier. Tonic invariance is achieved by augmenting the dataset with pitch-shifted copies of the audio. This algorithm consistently yields over 83% accuracy on a held-out test dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
MethodsTest
