Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success

Ben Griffin; Diego Vidaurre; Ugur Koyluoglu; Joseph Ternasky; Fuat Alican; Yigit Ihlamur

arXiv:2505.24622·cs.AI·September 17, 2025

Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success

Ben Griffin, Diego Vidaurre, Ugur Koyluoglu, Joseph Ternasky, Fuat Alican, Yigit Ihlamur

PDF

Open Access

TL;DR

The paper introduces Random Rule Forest (RRF), an interpretable ensemble method using LLM-generated questions to predict startup success with high accuracy and transparency.

Contribution

RRF is a novel lightweight ensemble approach that leverages LLMs to generate simple questions, combining human and AI insights for improved startup outcome prediction.

Findings

01

RRF achieves a 6.9x improvement over random baseline.

02

Adding expert questions increases performance to 8x.

03

RRF attains an F0.5 score of 0.121, outperforming baselines by 41%.

Abstract

Predicting rare outcomes such as startup success is central to venture capital, demanding models that are both accurate and interpretable. We introduce Random Rule Forest (RRF), a lightweight ensemble method that uses a large language model (LLM) to generate simple YES/NO questions in natural language. Each question functions as a weak learner, and their responses are combined using a threshold-based voting rule to form a strong, interpretable predictor. Applied to a dataset of 9,892 founders, RRF achieves a 6.9x improvement over a random baseline on held-out data; adding expert-crafted questions lifts this to 8x and highlights the value of human-LLM collaboration. Compared with zero- and few-shot baselines across three LLM architectures, RRF attains an F0.5 of 0.121, versus 0.086 for the best baseline (+0.035 absolute, +41% relative). By combining the creativity of LLMs with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Firm Innovation and Growth · Statistical and Computational Modeling

MethodsSparse Evolutionary Training