An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination
Minchul Shin

TL;DR
This paper adapts an AI agent-loop framework for empirical economics, incorporating holdout evaluation to improve transparency and distinguish robust results from sample-specific findings in forecast combination tasks.
Contribution
It introduces a transparent, auditable AI agent-loop architecture with holdout evaluation for empirical economics, enhancing the reliability of AI-driven specification searches.
Findings
Multiple agent runs outperform benchmarks in initial evaluation.
Holdout evaluation reveals not all improvements are robust.
Transparency aids in distinguishing robust results from sample-specific ones.
Abstract
AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Time Series Analysis · Sports Analytics and Performance · Auction Theory and Applications
