Cross-Stock Predictability via LLM-Augmented Semantic Networks

Yikuan Huang; Zheqi Fan; Kaiqi Hu; Yifan Ye

arXiv:2604.19476·q-fin.PM·April 28, 2026

Cross-Stock Predictability via LLM-Augmented Semantic Networks

Yikuan Huang, Zheqi Fan, Kaiqi Hu, Yifan Ye

PDF

TL;DR

This paper introduces a two-stage framework using large language models to filter and refine text-based financial networks, improving cross-stock return predictability and trading performance.

Contribution

It presents a novel LLM-augmented method for filtering economic relations in financial networks, enhancing their predictive power and economic relevance.

Findings

01

LLM-based filtering increased Sharpe ratio from 0.742 to 0.820.

02

Refined networks reduced maximum drawdown from 10.47% to 7.85%.

03

Improved cross-stock predictability in backtests.

Abstract

Text-based financial networks are increasingly used to study cross-stock return predictability. A common approach constructs links from similarities in firms' disclosure embeddings, but such networks often contain spurious edges because textual proximity does not necessarily imply economic connection. We propose a two-stage framework that first builds a sparse candidate graph from 10-K embeddings and then uses a large language model to classify and filter candidate edges according to their economic relations. The refined graph is used to aggregate pair-level mean-reversion signals into stock-level trading signals with relation-aware and distance-based weights. In a backtest on S&P 500 constituents from 2011 to 2019, LLM-based edge filtering improves the long-short Sharpe ratio from 0.742 to 0.820 and reduces maximum drawdown from $-$ 10.47% to $-$ 7.85%. These results suggest that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.