BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
Chih-Cheng Chang, Li Su

TL;DR
BEAST is an online streaming Transformer model that accurately tracks beats and downbeats in real-time music analysis with low latency, outperforming previous models.
Contribution
The paper introduces BEAST, a novel online joint beat and downbeat tracking system using streaming Transformer with relative positional encoding for improved accuracy.
Findings
Achieves 80.04% F1 in beat tracking
Attains 46.78% F1 in downbeat tracking
Outperforms state-of-the-art online models by 5 percentage points
Abstract
Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scenarios, BEAST applies contextual block processing in the Transformer encoder. Moreover, we adopt relative positional encoding in the attention layer of the streaming Transformer encoder to capture relative timing position which is critically important information in music. Carrying out beat and downbeat experiments on benchmark datasets for a low latency scenario with maximum latency under 50 ms, BEAST achieves an F1-measure of 80.04% in beat and 46.78% in downbeat, which is a substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
