WaveBeat: End-to-end beat and downbeat tracking in the time domain
Christian J. Steinmetz, Joshua D. Reiss

TL;DR
WaveBeat introduces an end-to-end waveform-based model for beat and downbeat tracking, eliminating the need for spectral features and achieving state-of-the-art results with large receptive fields and efficient TCNs.
Contribution
It is the first to perform joint beat and downbeat tracking directly from raw waveforms using temporal convolutional networks.
Findings
Outperforms previous state-of-the-art on some datasets.
Achieves comparable results on other datasets.
Demonstrates the potential of time domain approaches for beat tracking.
Abstract
Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectral features, and instead, produces beat and downbeat predictions directly from the waveform, the first of its kind for this task. Our model utilizes temporal convolutional networks (TCNs) operating on waveforms that achieve a very large receptive field ( 30 s) at audio sample rates in a memory efficient manner by employing rapidly growing dilation factors with fewer layers. With a straightforward data augmentation strategy, our method outperforms previous state-of-the-art methods on some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
