Hunting Tomorrow's Leaders: Using Machine Learning to Forecast S&P 500 Additions & Removal
Vidhi Agrawal, Eesha Khalid, Tianyu Tan, Doris Xu

TL;DR
This paper demonstrates how machine learning, specifically Random Forests, can accurately forecast S&P 500 index membership changes, providing valuable insights for investment strategies and market analysis.
Contribution
It introduces a novel application of machine learning to predict S&P 500 additions and removals using diverse financial features and emphasizes model transparency.
Findings
Achieved test F1 score of 0.85 with Random Forest
Predicted specific additions and removals for Q3 2023
Enhanced investment decision-making through predictive modeling
Abstract
This study applies machine learning to predict S&P 500 membership changes: key events that profoundly impact investor behavior and market dynamics. Quarterly data from WRDS datasets (2013 onwards) was used, incorporating features such as industry classification, financial data, market data, and corporate governance indicators. Using a Random Forest model, we achieved a test F1 score of 0.85, outperforming logistic regression and SVC models. This research not only showcases the power of machine learning for financial forecasting but also emphasizes model transparency through SHAP analysis and feature engineering. The model's real world applicability is demonstrated with predicted changes for Q3 2023, such as the addition of Uber (UBER) and the removal of SolarEdge Technologies (SEDG). By incorporating these predictions into a trading strategy i.e. buying stocks announced for addition and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods
MethodsLogistic Regression · Shapley Additive Explanations
