Foresight Learning for SEC Risk Prediction
Benjamin Turtel, Paul Wilczewski, Danny Franklin, Kris Skotheim

TL;DR
This paper introduces a fully automated pipeline that converts SEC risk disclosures into supervised data for training a small language model to predict the likelihood of risks materializing, improving probabilistic accuracy and calibration.
Contribution
The work presents a novel automated data generation method and a compact model for probabilistic risk prediction from SEC filings, outperforming larger models.
Findings
The model significantly outperforms pretrained and heuristic baselines.
It surpasses GPT-5 in probabilistic accuracy and calibration.
The approach enables scalable, domain-specific model training without proprietary data.
Abstract
Risk disclosures in SEC filings describe potential adverse events but rarely quantify their likelihood, limiting their usefulness for probabilistic analysis. A central obstacle is the absence of large-scale, risk-level supervision linking disclosed risks to realized outcomes. We introduce a fully automated data generation pipeline that converts qualitative SEC risk disclosures into temporally grounded supervision using only public data. For each filing, the pipeline generates firm-specific, time-bounded risk queries from the Risk Factors section and labels them by automatically resolving outcomes against subsequent disclosures. Using this dataset of risk queries and outcomes grounded in SEC filings, we train a compact large language model to estimate the probability that a disclosed risk will materialize within a specified horizon. Despite its modest size, the resulting model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuditing, Earnings Management, Governance · Financial Reporting and XBRL · Financial Distress and Bankruptcy Prediction
