Predicting Day-Ahead Stock Returns using Search Engine Query Volumes: An Application of Gradient Boosted Decision Trees to the S&P 100
Christopher Bockel-Rickermann

TL;DR
This paper demonstrates that combining internet search query data with financial data using gradient boosted decision trees can predict day-ahead stock returns, outperforming random guessing and challenging market efficiency assumptions.
Contribution
It introduces a novel approach integrating search engine query volumes with financial data for stock return prediction using gradient boosting, showing promising empirical results.
Findings
Models achieve AUC between 54.2% and 56.7%.
Trading strategies based on predictions yield over 57% annual returns before costs.
Results challenge the weak and semi-strong market efficiency hypotheses.
Abstract
The internet has changed the way we live, work and take decisions. As it is the major modern resource for research, detailed data on internet usage exhibits vast amounts of behavioral information. This paper aims to answer the question whether this information can be facilitated to predict future returns of stocks on financial capital markets. In an empirical analysis it implements gradient boosted decision trees to learn relationships between abnormal returns of stocks within the S&P 100 index and lagged predictors derived from historical financial data, as well as search term query volumes on the internet search engine Google. Models predict the occurrence of day-ahead stock returns in excess of the index median. On a time frame from 2005 to 2017, all disparate datasets exhibit valuable information. Evaluated models have average areas under the receiver operating characteristic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Financial Markets and Investment Strategies · Complex Systems and Time Series Analysis
