When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction

Yagiz Ihlamur

arXiv:2604.00339·cs.LG·April 2, 2026

When Career Data Runs Out: Structured Feature Engineering and Signal Limits for Founder Success Prediction

Yagiz Ihlamur

PDF

TL;DR

This paper develops structured feature engineering from raw JSON data to predict startup success, demonstrating the limits of current signals and the need for richer datasets.

Contribution

It introduces a structured feature engineering approach and benchmarks the signal limits in founder success prediction using JSON data and LLM features.

Findings

01

Engineered 28 features from raw JSON fields improving prediction accuracy.

02

LLM-derived prose features capture some importance but do not add predictive signal.

03

The dataset's information content limits the prediction ceiling, indicating the need for richer data.

Abstract

Predicting startup success from founder career data is hard. The signal is weak, the labels are rare (9%), and most founders who succeed look almost identical to those who fail. We engineer 28 structured features directly from raw JSON fields -- jobs, education, exits -- and combine them with a deterministic rule layer and XGBoost boosted stumps. Our model achieves Val F0.5 = 0.3030, Precision = 0.3333, Recall = 0.2222 -- a +17.7pp improvement over the zero-shot LLM baseline. We then run a controlled experiment: extract 9 features from the prose field using Claude Haiku, at 67% and 100% dataset coverage. LLM features capture 26.4% of model importance but add zero CV signal (delta = -0.05pp). The reason is structural: anonymised_prose is generated from the same JSON fields we parse directly -- it is a lossy re-encoding, not a richer source. The ceiling (CV ~= 0.25, Val ~= 0.30) reflects…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.