Audio Prototypical Network For Controllable Music Recommendation
F{\i}rat \"Oncel, Emiliano Penaloza, Haolun Wu, Shubham Gupta, Mirco Ravanelli, Laurent Charlin, Cem Subakan

TL;DR
This paper introduces an audio prototypical network that enhances music recommendation systems by providing interpretable and controllable user profiles based on meaningful musical features, while maintaining competitive performance.
Contribution
It presents a novel audio prototypical network that offers interpretability and control in music recommendations, addressing limitations of black-box models.
Findings
Competitive recommendation performance achieved.
Provides interpretable user profiles based on musical features.
Enables user control over music preferences.
Abstract
Traditional recommendation systems represent user preferences in dense representations obtained through black-box encoder models. While these models often provide strong recommendation performance, they lack interpretability for users, leaving users unable to understand or control the system's modeling of their preferences. This limitation is especially challenging in music recommendation, where user preferences are highly personal and often evolve based on nuanced qualities like mood, genre, tempo, or instrumentation. In this paper, we propose an audio prototypical network for controllable music recommendation. This network expresses user preferences in terms of prototypes representative of semantically meaningful features pertaining to musical qualities. We show that the model obtains competitive recommendation performance compared to popular baseline models while also providing…
Peer Reviews
Decision·Submitted to ICLR 2025
* The task of controllable music recommendation is valuable in both academia and industry. * The motivation of using music-clip level prototypes is reasonable and clear. And the way to directly use music content for recommendation is a promising direction. * The writing is clear and easy to follow.
1. Less technical novelty: * The proposed prototype-based controllable music recommender model is a quite straightforward attention-based neural network model with certain losses. The attention-based model architecture has been proposed and extensively studied in recommender systems, which even though is practical and helpful, it is not quite novel to the research or industry community. * The learning or extraction of the prototype is based on some existing methods (MERT or MusicGen). I am ex
1. This paper attempts to use audio features for music recommendation to solve the problem of insufficient keywords/tags problem. 2. This paper proposes a metric to test the controllability of the model.
1. Although the author proposes that the model is controllable, the review is not clear about the definition of controllability in the paper, and the comparison between the model and other baseline methods (only one), and where its controllability is superior. 2. There are many classic CF methods and DL-based methods for recommendation based on user behavior. The reviewer noticed that the paper only selected VAE-based methods for comparison without any explanation or motivation of such selection
This paper introduces an interesting perspective on combining prototype networks and the controllability of recommendations. The fact that the prototypes were interpretable (i.e. listenable music clips) is a nice feature, and the controllability is measured by the calibration of user tag preferences also provides a direct way to implement user controls. The experiments seem to indicate the effectiveness of the added two objectives by improving the recommendation performance. The authors furthe
The method lacks novelty: each component of the whole model is not new. The key concept of using prototypes for explainable recommendations has been explored in [1]. Different from [1], the number of prototypes is fixed in this paper and aligned with pre-defined song tags, which can limit the expressiveness of the model and may suffer from noisiness in tag data. The quality of these prototypes is delegated to a generative music model, but the experiments do not address details on how the quality
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Recommender Systems and Techniques · Neuroscience and Music Perception
