Cue Point Estimation using Object Detection
Giulia Arg\"uello, Luca A. Lanzend\"orfer, Roger Wattenhofer

TL;DR
This paper introduces a novel computer vision approach using object detection transformers to automatically estimate cue points in music tracks, significantly improving accuracy without analyzing musical content directly.
Contribution
It presents a new method leveraging pre-trained object detection transformers and a large annotated dataset for cue point estimation in music, outperforming previous techniques.
Findings
Higher precision in cue point detection compared to prior methods
No need for low-level musical feature analysis
Strong adherence to musical phrasing
Abstract
Cue points indicate possible temporal boundaries in a transition between two pieces of music in DJ mixing and constitute a crucial element in autonomous DJ systems as well as for live mixing. In this work, we present a novel method for automatic cue point estimation, interpreted as a computer vision object detection task. Our proposed system is based on a pre-trained object detection transformer which we fine-tune on our novel cue point dataset. Our provided dataset contains 21k manually annotated cue points from human experts as well as metronome information for nearly 5k individual tracks, making this dataset 35x larger than the previously available cue point dataset. Unlike previous methods, our approach does not require low-level musical information analysis, while demonstrating increased precision in retrieving cue point positions. Moreover, our proposed method demonstrates high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies
