HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer
Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu

TL;DR
This paper presents IRFT, an end-to-end Transformer-based method for mining interpretable, formulaic risk factors from high-frequency trading data, outperforming existing symbolic regression approaches in return and speed.
Contribution
Introducing a novel Transformer-based framework that directly generates complete, formulaic risk factors from high-frequency data without predefined operator skeletons.
Findings
IRFT achieves 30% higher investment return than benchmarks.
IRFT's inference is orders of magnitude faster.
IRFT effectively determines the form and constants of risk factors.
Abstract
In quantitative trading, transforming historical stock data into interpretable, formulaic risk factors enhances the identification of market volatility and risk. Despite recent advancements in neural networks for extracting latent risk factors, these models remain limited to feature extraction and lack explicit, formulaic risk factor designs. By viewing symbolic mathematics as a language where valid mathematical expressions serve as meaningful "sentences" we propose framing the task of mining formulaic risk factors as a language modeling problem. In this paper, we introduce an end to end methodology, Intraday Risk Factor Transformer (IRFT), to directly generate complete formulaic risk factors, including constants. We use a hybrid symbolic numeric vocabulary where symbolic tokens represent operators and stock features, and numeric tokens represent constants. We train a Transformer model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Software Testing and Debugging Techniques · Time Series Analysis and Forecasting
