Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models

Yizhi Zhou; Jia-Qi Yang; De-Chuan Zhan; Da-Wei Zhou

arXiv:2604.20847·cs.IR·April 24, 2026

Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models

Yizhi Zhou, Jia-Qi Yang, De-Chuan Zhan, Da-Wei Zhou

PDF

1 Repo

TL;DR

This paper introduces TASTE, a multimodal music recommendation dataset and framework, demonstrating the effectiveness of large-scale audio encoders and a new feature aggregation method for improved recommendation performance.

Contribution

It presents a new multimodal dataset, a benchmarking framework, and the MuQ-token method for efficient multi-layer audio feature integration in music recommendation.

Findings

01

Audio representations significantly improve recommendation tasks.

02

MuQ-token outperforms other feature integration methods.

03

Content-based approaches are validated as effective for music recommendation.

Abstract

Music Recommendation Systems (MRSs) are a cornerstone of modern streaming platforms. Existing recommendation models, spanning both recall and ranking stages, predominantly rely on collaborative filtering, which fails to exploit the intrinsic characteristics of audio and consequently leads to suboptimal performance, particularly in cold-start scenarios. However, existing music recommendation datasets often lack rich multimodal information, such as raw audio signals and descriptive textual metadata. Moreover, current recommender system evaluation frameworks remain inadequate, as they neither fully leverage multimodal information nor support a diverse range of algorithms, especially multimodal methods. To address these limitations, we propose TASTE, a comprehensive dataset and benchmarking framework designed to highlight the role of multimodal information in music recommendation. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zreach/TASTE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.