Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery
Runze Shi, Shengyu Yan, Yuecheng Cai, Chengxi Lv

TL;DR
Hubble is an LLM-driven framework for safe, diverse, and reproducible discovery of alpha factors in equity markets, combining interpretability, diagnostics, and validation for robust financial research.
Contribution
The paper introduces Hubble, a novel agentic framework that restricts LLM-generated formulas to interpretable trees, enhancing safety, diversity, and reproducibility in alpha factor discovery.
Findings
Discovered top factors dominated by range, volatility, and trend families.
Achieved positive out-of-sample performance for several factors.
System operated with zero runtime crashes over extensive evaluations.
Abstract
Automated alpha discovery is difficult because the search space of formulaic factors is combinatorial, the signal-to-noise ratio in daily equity data is low, and unconstrained program generation is operationally unsafe. We present Hubble, an agentic factor mining framework that combines large language models (LLMs) with a domain-specific operator language, an abstract syntax tree (AST) execution sandbox, a dual-channel retrieval-augmented generation (RAG) module, and a family-aware selection mechanism. Instead of treating the LLM as an unconstrained code generator, Hubble restricts generation to interpretable operator trees, evaluates every candidate through a deterministic cross-sectional pipeline, and feeds back both top formulas and structured family-level diagnostics to subsequent rounds. The current system additionally introduces positive/negative RAG, formula-similarity penalties,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
