Hierarchical Linear Dynamical System for Representing Notes from Recorded Audio
Leila Kalantari, Jose Principe, Kathryn E. Sieving

TL;DR
This paper introduces a hierarchical linear dynamical system (HLDS) approach for segmenting and classifying notes in audio recordings, effectively handling outliers and applicable to bioacoustics and musicology.
Contribution
The paper proposes a novel parameter setting method for HLDS and demonstrates its application to simultaneous segmentation and classification of audio notes with outliers.
Findings
HLDS effectively classifies bird and musical notes in recordings.
The method handles outliers and unknown notes successfully.
Experimental results show high accuracy in note segmentation and classification.
Abstract
We seek to develop simultaneous segmentation and classification of notes from audio recordings in presence of outliers. The selected architecture for modeling time series is hierarchical linear dynamical system (HLDS). We propose a novel method for its parameter setting. HLDS can potentially be employed in two ways: 1) simultaneous segmentation and clustering for exploring data, i.e. finding unknown notes, 2) simultaneous segmentation and classification of audio recording for finding the notes of interest in the presence of outliers. We adapted HLDS for the second purpose since it is an easier task and still a challenging problem, e.g. in the field of bioacoustics. Each test clip has the same notes (but different instances) as of the training clip and also contain outlier notes. At test, it is automatically decided to which class of interest a note belongs to if any. Two applications of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Animal Vocal Communication and Behavior · Speech and Audio Processing
