Revealing economic facts: LLMs know more than they say
Marcus Buckmann, Quynh Anh Nguyen, Edward Hill

TL;DR
This paper demonstrates that large language models' hidden states contain richer economic information than their outputs, enabling accurate estimation and imputation of economic data with minimal labeled examples.
Contribution
It introduces a method to extract economic facts from LLM hidden states, outperforming text-based models and enabling transfer learning without labeled data.
Findings
Hidden states outperform text outputs in economic estimation.
Few dozen labeled examples suffice for training.
Transfer learning enhances accuracy without labeled data.
Abstract
We investigate whether the hidden states of large language models (LLMs) can be used to estimate and impute economic and financial statistics. Focusing on county-level (e.g. unemployment) and firm-level (e.g. total assets) variables, we show that a simple linear model trained on the hidden states of open-source LLMs outperforms the models' text outputs. This suggests that hidden states capture richer economic information than the responses of the LLMs reveal directly. A learning curve analysis indicates that only a few dozen labelled examples are sufficient for training. We also propose a transfer learning method that improves estimation accuracy without requiring any labelled data for the target variable. Finally, we demonstrate the practical utility of hidden-state representations in super-resolution and data imputation tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
