Audio-based Musical Version Identification: Elements and Challenges
Furkan Yesiler, Guillaume Doras, Rachel M. Bittner, Christopher J., Tralie, Joan Serr\`a

TL;DR
This paper reviews 20 years of research on musical version identification, highlighting the evolution from accuracy-scalability trade-offs to modern deep learning approaches that enable practical industrial applications.
Contribution
It provides a comprehensive overview of key ideas and developments in musical version identification, connecting past methods to current deep learning-based solutions.
Findings
Deep learning approaches improve accuracy and scalability in VI systems.
Historical methods laid the foundation for current deep learning techniques.
Recent systems are more suitable for industrial deployment.
Abstract
In this article, we aim to provide a review of the key ideas and approaches proposed in 20 years of scientific literature around musical version identification (VI) research and connect them to current practice. For more than a decade, VI systems suffered from the accuracy-scalability trade-off, with attempts to increase accuracy that typically resulted in cumbersome, non-scalable systems. Recent years, however, have witnessed the rise of deep learning-based approaches that take a step toward bridging the accuracy-scalability gap, yielding systems that can realistically be deployed in industrial applications. Although this trend positively influences the number of researchers and institutions working on VI, it may also result in obscuring the literature before the deep learning era. To appreciate two decades of novel ideas in VI research and to facilitate building better systems, we now…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
