Helicality: An Isomap-based Measure of Octave Equivalence in Audio Data
Sripathi Sridhar, Vincent Lostanlen

TL;DR
This paper introduces 'helicality', an unsupervised Isomap-based measure that quantifies octave equivalence in audio data by fitting a 3-D embedding to a helix, enabling scalable analysis of musical and speech signals.
Contribution
The paper proposes a novel, scalable measure of octave equivalence called 'helicality' that automates the analysis of Isomap embeddings in audio data.
Findings
Isolated musical notes have higher helicality than speech.
Drum hits have lower helicality compared to notes and speech.
The method effectively distinguishes different audio types based on helicality.
Abstract
Octave equivalence serves as domain-knowledge in MIR systems, including chromagram, spiral convolutional networks, and harmonic CQT. Prior work has applied the Isomap manifold learning algorithm to unlabeled audio data to embed frequency sub-bands in 3-D space where the Euclidean distances are inversely proportional to the strength of their Pearson correlations. However, discovering octave equivalence via Isomap requires visual inspection and is not scalable. To address this problem, we define "helicality" as the goodness of fit of the 3-D Isomap embedding to a Shepherd-Risset helix. Our method is unsupervised and uses a custom Frank-Wolfe algorithm to minimize a least-squares objective inside a convex hull. Numerical experiments indicate that isolated musical notes have a higher helicality than speech, followed by drum hits.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
