From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

Lingzhe Zhang; Tong Jia; Yunpeng Zhai; Zixuan Xie; Chiming Duan; Minghua He; Philip S. Yu; and Ying Li

arXiv:2605.15412·cs.CE·May 18, 2026

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

Lingzhe Zhang, Tong Jia, Yunpeng Zhai, Zixuan Xie, Chiming Duan, Minghua He, Philip S. Yu, and Ying Li

PDF

TL;DR

QuantEvolver introduces reinforcement fine-tuning for LLM-based alpha factor discovery, enabling efficient, diverse, and high-quality factor generation without prompt-level feedback loops, improving over existing methods.

Contribution

The paper proposes a novel reinforcement fine-tuning framework that internalizes feedback into model parameters, reducing context explosion and stagnation in LLM-based alpha factor discovery.

Findings

01

Consistently improves primary evaluation metrics across benchmarks.

02

Produces higher-quality, more diverse, and complementary factor pools.

03

Effectively internalizes historical optimization experience in the LLM.

Abstract

Modern quantitative trading increasingly relies on systematic models to extract predictive signals from large-scale financial data, where alpha factor discovery plays a central role in transforming market observations into tradable signals. Recent LLM-based methods have shown promise in automating factor generation, but most of them still rely on prompt-level generation--evaluation--feedback loops for iterative optimization. As the loop becomes longer, repeatedly appended historical candidates and feedback can cause context explosion, increase inference cost, dilute useful information, and introduce feedback drift. Moreover, these methods often depend on very large LLMs whose stable generation preferences may lead to structurally similar expressions, redundant candidates, and search stagnation. To address these limitations, we propose \textsc{QuantEvolver}, a self-evolving alpha factor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.