Long-term stock index forecasting based on text mining of regulatory disclosures
Stefan Feuerriegel, Julius Gordon

TL;DR
This study explores whether analyzing regulatory disclosures' language can enhance long-term stock index forecasts, demonstrating that text-based models outperform baseline predictions over 24 months.
Contribution
It introduces a comparative analysis of data-driven and knowledge-driven dimensionality reduction techniques for long-term stock index forecasting using text mining.
Findings
Text-based models significantly reduce forecast errors over baseline.
Models outperform historic lag predictions in long-term forecasts.
Research supports financial decision-support applications, especially for ETFs.
Abstract
Share valuations are known to adjust to new information entering the market, such as regulatory disclosures. We study whether the language of such news items can improve short-term and especially long-term (24 months) forecasts of stock indices. For this purpose, this work utilizes predictive models suited to high-dimensional data and specifically compares techniques for data-driven and knowledge-driven dimensionality reduction in order to avoid overfitting. Our experiments, based on 75,927 ad hoc announcements from 1996-2016, reveal the following results: in the long run, text-based models succeed in reducing forecast errors below baseline predictions from historic lags at a statistically significant level. Our research provides implications to business applications of decision-support in financial markets, especially given the growing prevalence of index ETFs (exchange traded funds).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
