When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies
Zhengzhe Yang

TL;DR
This paper investigates the use of large language models as feature extractors for reinforcement learning trading agents, revealing that valid features do not always enhance policy robustness under distribution shifts.
Contribution
It introduces an automated prompt-optimization method for extracting predictive features from LLMs and analyzes their impact on RL trading performance during macroeconomic shocks.
Findings
Optimized prompts produce features with high predictive correlation (IC > 0.15).
Features improve performance in stable conditions but add noise during shocks.
Macroeconomic variables remain the most robust drivers of policy improvement.
Abstract
Can large language models (LLMs) generate continuous numerical features that improve reinforcement learning (RL) trading agents? We build a modular pipeline where a frozen LLM serves as a stateless feature extractor, transforming unstructured daily news and filings into a fixed-dimensional vector consumed by a downstream PPO agent. We introduce an automated prompt-optimization loop that treats the extraction prompt as a discrete hyperparameter and tunes it directly against the Information Coefficient - the Spearman rank correlation between predicted and realized returns - rather than NLP losses. The optimized prompt discovers genuinely predictive features (IC above 0.15 on held-out data). However, these valid intermediate representations do not automatically translate into downstream task performance: during a distribution shift caused by a macroeconomic shock, LLM-derived features add…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
