Random Rule Forest (RRF): Interpretable Ensembles of LLM-Generated Questions for Predicting Startup Success
Ben Griffin, Diego Vidaurre, Ugur Koyluoglu, Joseph Ternasky, Fuat Alican, Yigit Ihlamur

TL;DR
The paper introduces Random Rule Forest (RRF), an interpretable ensemble method using LLM-generated questions to predict startup success with high accuracy and transparency.
Contribution
RRF is a novel lightweight ensemble approach that leverages LLMs to generate simple questions, combining human and AI insights for improved startup outcome prediction.
Findings
RRF achieves a 6.9x improvement over random baseline.
Adding expert questions increases performance to 8x.
RRF attains an F0.5 score of 0.121, outperforming baselines by 41%.
Abstract
Predicting rare outcomes such as startup success is central to venture capital, demanding models that are both accurate and interpretable. We introduce Random Rule Forest (RRF), a lightweight ensemble method that uses a large language model (LLM) to generate simple YES/NO questions in natural language. Each question functions as a weak learner, and their responses are combined using a threshold-based voting rule to form a strong, interpretable predictor. Applied to a dataset of 9,892 founders, RRF achieves a 6.9x improvement over a random baseline on held-out data; adding expert-crafted questions lifts this to 8x and highlights the value of human-LLM collaboration. Compared with zero- and few-shot baselines across three LLM architectures, RRF attains an F0.5 of 0.121, versus 0.086 for the best baseline (+0.035 absolute, +41% relative). By combining the creativity of LLMs with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Firm Innovation and Growth · Statistical and Computational Modeling
MethodsSparse Evolutionary Training
