Fake Date Tests: Can We Trust In-sample Accuracy of LLMs in Macroeconomic Forecasting?
Alexander Eliseev, Sergei Seleznev

TL;DR
This paper introduces fake date tests to evaluate whether in-sample accuracy of LLMs in macroeconomic forecasting is trustworthy, revealing that modern LLMs exhibit lookahead bias in their predictions.
Contribution
The paper develops prompt sensitivity tests, called fake date tests, to detect lookahead and context biases in LLMs' in-sample macroeconomic forecasts.
Findings
None of the tested LLMs passed the lookahead bias test.
The tests reveal significant biases in LLMs' in-sample forecasts.
Results question the reliability of in-sample accuracy as a performance measure.
Abstract
Large language models (LLMs) are a type of machine learning tool that economists have started to apply in their empirical research. One such application is macroeconomic forecasting with backtesting of LLMs, even though they are trained on the same data that is used to estimate their forecasting performance. Can these in-sample accuracy results be extrapolated to the model's out-of-sample performance? To answer this question, we developed a family of prompt sensitivity tests and two members of this family, which we call the fake date tests. These tests aim to detect two types of biases in LLMs' in-sample forecasts: lookahead bias and context bias. According to the empirical results, none of the modern LLMs tested in this study passed our first test, signaling the presence of lookahead bias in their in-sample forecasts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
