Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining
Junjie Zhao, Chengxi Zhang, Chenkai Wang, Peng Yang

TL;DR
This paper introduces Trajectory-level Reward Shaping (TLRS), a novel method that enhances reinforcement learning for mining formulaic alpha factors by providing dense rewards and improving efficiency, leading to better predictive power in stock index experiments.
Contribution
The paper proposes TLRS, a new reward shaping technique that offers dense, subsequence-level rewards and reduces training variance, significantly improving RL-based alpha factor mining.
Findings
TLRS boosts the Rank IC by 9.29% over existing methods.
It reduces computational complexity from linear to constant.
Experiments on six stock indices validate its effectiveness.
Abstract
Reinforcement learning (RL) has successfully automated the complex process of mining formulaic alpha factors, for creating interpretable and profitable investment strategies. However, existing methods are hampered by the sparse rewards given the underlying Markov Decision Process. This inefficiency limits the exploration of the vast symbolic search space and destabilizes the training process. To address this, Trajectory-level Reward Shaping (TLRS), a novel reward shaping method, is proposed. TLRS provides dense, intermediate rewards by measuring the subsequence-level similarity between partially generated expressions and a set of expert-designed formulas. Furthermore, a reward centering mechanism is introduced to reduce training variance. Extensive experiments on six major Chinese and U.S. stock indices show that TLRS significantly improves the predictive power of mined factors,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
