Prediction of Spotify Chart Success Using Audio and Streaming Features
Ian Jacob Cabansag, Paul Ntegeka

TL;DR
This study develops a machine learning pipeline to predict Spotify chart success using audio features and early engagement data, demonstrating high accuracy and robustness across models, with implications for music marketing and A&R strategies.
Contribution
The paper introduces a comprehensive classification approach that combines audio and streaming data to accurately forecast a song's chart success, highlighting the predictive power of audio features alone.
Findings
Tree-based models achieved near 0.95 macro F1-score and 97% accuracy.
Audio features alone can predict chart success without streaming data.
Models remain effective even without stream count and rank history.
Abstract
Spotify's streaming charts offer a real-time lens into music popularity, driving discovery, playlists, and even revenue potential. Understanding what influences a song's rise in ranks on these charts-especially early on-can guide marketing efforts, investment decisions, and even artistic direction. In this project, we developed a classification pipeline to predict a song's chart success based on its musical characteristics and early engagement data. Using all 2024 U.S. Top 200 Spotify Daily Charts and the Spotify Web API, we built a dataset containing both metadata and audio features for 14,639 unique songs. The project was structured in two phases. First, we benchmarked four models: Logistic Regression, K Nearest Neighbors, Random Forest, and XGBoost-using a standard train-test split. In the second phase, we incorporated cross-validation, hyperparameter tuning, and detailed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
