Trillion Dollar Words: A New Financial Dataset, Task & Market Analysis
Agam Shah, Suvan Paturi, Sudheer Chava

TL;DR
This paper introduces a large annotated dataset of FOMC communications, develops a hawkish-dovish classification task, benchmarks language models, and analyzes the impact of monetary policy stance on financial markets.
Contribution
It provides the largest dataset of FOMC texts, a novel classification task, and a market impact analysis, advancing research in financial NLP and policy analysis.
Findings
RoBERTa-large achieves best classification performance.
Constructed a monetary policy stance measure from FOMC documents.
Policy stance impacts treasury, stock markets, and macroeconomic indicators.
Abstract
Monetary policy pronouncements by Federal Open Market Committee (FOMC) are a major driver of financial market returns. We construct the largest tokenized and annotated dataset of FOMC speeches, meeting minutes, and press conference transcripts in order to understand how monetary policy influences financial markets. In this study, we develop a novel task of hawkish-dovish classification and benchmark various pre-trained language models on the proposed dataset. Using the best-performing model (RoBERTa-large), we construct a measure of monetary policy stance for the FOMC document release days. To evaluate the constructed measure, we study its impact on the treasury market, stock market, and macroeconomic indicators. Our dataset, models, and code are publicly available on Huggingface and GitHub under CC BY-NC 4.0 license.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods
