Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery

Runze Shi; Shengyu Yan; Yuecheng Cai; Chengxi Lv

arXiv:2604.09601·cs.AI·April 15, 2026

Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery

Runze Shi, Shengyu Yan, Yuecheng Cai, Chengxi Lv

PDF

TL;DR

Hubble is an LLM-driven framework for safe, diverse, and reproducible discovery of alpha factors in equity markets, combining interpretability, diagnostics, and validation for robust financial research.

Contribution

The paper introduces Hubble, a novel agentic framework that restricts LLM-generated formulas to interpretable trees, enhancing safety, diversity, and reproducibility in alpha factor discovery.

Findings

01

Discovered top factors dominated by range, volatility, and trend families.

02

Achieved positive out-of-sample performance for several factors.

03

System operated with zero runtime crashes over extensive evaluations.

Abstract

Automated alpha discovery is difficult because the search space of formulaic factors is combinatorial, the signal-to-noise ratio in daily equity data is low, and unconstrained program generation is operationally unsafe. We present Hubble, an agentic factor mining framework that combines large language models (LLMs) with a domain-specific operator language, an abstract syntax tree (AST) execution sandbox, a dual-channel retrieval-augmented generation (RAG) module, and a family-aware selection mechanism. Instead of treating the LLM as an unconstrained code generator, Hubble restricts generation to interpretable operator trees, evaluates every candidate through a deterministic cross-sectional pipeline, and feeds back both top formulas and structured family-level diagnostics to subsequent rounds. The current system additionally introduces positive/negative RAG, formula-similarity penalties,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.