ByteCover: Cover Song Identification via Multi-Loss Training
Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, Zejun Ma

TL;DR
ByteCover is a novel deep learning approach for cover song identification that combines IBN blocks for invariant feature learning with multi-loss training to improve discrimination and robustness across musical variations.
Contribution
It introduces a ResNet-IBN architecture with multi-loss training for improved cover song identification performance.
Findings
ByteCover outperforms existing methods by 20.9% on Da-TACOS dataset.
The IBN blocks enable learning features invariant to musical attribute changes.
Multi-loss training enhances inter-class discrimination and intra-class compactness.
Abstract
We present in this paper ByteCover, which is a new feature learning method for cover song identification (CSI). ByteCover is built based on the classical ResNet model, and two major improvements are designed to further enhance the capability of the model for CSI. In the first improvement, we introduce the integration of instance normalization (IN) and batch normalization (BN) to build IBN blocks, which are major components of our ResNet-IBN model. With the help of the IBN blocks, our CSI model can learn features that are invariant to the changes of musical attributes such as key, tempo, timbre and genre, while preserving the version information. In the second improvement, we employ the BNNeck method to allow a multi-loss training and encourage our method to jointly optimize a classification loss and a triplet loss, and by this means, the inter-class discrimination and intra-class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
MethodsAverage Pooling · Residual Connection · Convolution · Kaiming Initialization · Global Average Pooling · 1x1 Convolution · Residual Block · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Bottleneck Residual Block
