The Ideation Bottleneck: Decomposing the Quality Gap Between AI-Generated and Human Economics Research
Ning Li

TL;DR
This study decomposes the quality gap between AI-generated and human economics research into idea and execution quality, revealing ideation as the primary bottleneck for AI performance.
Contribution
It introduces a dual-evaluation framework for idea and execution quality, quantifies their contributions, and highlights ideation as the main challenge for AI research in economics.
Findings
Human papers outperform AI in idea quality with a large effect size.
Execution quality gap is smaller but still significant, with humans scoring higher.
Only 0.8% of AI papers surpass median human papers on both idea and execution.
Abstract
Autonomous AI systems can now generate complete economics research papers, but they substantially underperform human-authored publications in head-to-head comparisons. This paper decomposes the quality gap into two independent components: research idea quality and execution quality. Using a two-model ensemble of fine-tuned language models trained on publication decisions (Gong, Li, and Zhou, 2026) to evaluate idea quality and a comprehensive six-dimension rubric assessed by Gemini 3.1 Flash Lite -- the same model family used as the APE tournament judge, ensuring methodological consistency -- to evaluate execution quality, we analyze 953 economics papers -- 912 AI-generated papers from the APE project and 41 human papers published in the American Economic Review and AEJ: Economic Policy. The idea quality gap is large (Cohen's d = 2.23, p < 0.001), with human papers achieving 47.1% mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
