Cross-Stock Predictability via LLM-Augmented Semantic Networks
Yikuan Huang, Zheqi Fan, Kaiqi Hu, Yifan Ye

TL;DR
This paper introduces a two-stage framework using large language models to filter and refine text-based financial networks, improving cross-stock return predictability and trading performance.
Contribution
It presents a novel LLM-augmented method for filtering economic relations in financial networks, enhancing their predictive power and economic relevance.
Findings
LLM-based filtering increased Sharpe ratio from 0.742 to 0.820.
Refined networks reduced maximum drawdown from 10.47% to 7.85%.
Improved cross-stock predictability in backtests.
Abstract
Text-based financial networks are increasingly used to study cross-stock return predictability. A common approach constructs links from similarities in firms' disclosure embeddings, but such networks often contain spurious edges because textual proximity does not necessarily imply economic connection. We propose a two-stage framework that first builds a sparse candidate graph from 10-K embeddings and then uses a large language model to classify and filter candidate edges according to their economic relations. The refined graph is used to aggregate pair-level mean-reversion signals into stock-level trading signals with relation-aware and distance-based weights. In a backtest on S&P 500 constituents from 2011 to 2019, LLM-based edge filtering improves the long-short Sharpe ratio from 0.742 to 0.820 and reduces maximum drawdown from 10.47% to 7.85%. These results suggest that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
