Evaluating Large Language Model Capabilities in Assessing Spatial Econometrics Research
Giuseppe Arbia, Luca Morandini, Vincenzo Nardelli

TL;DR
This study evaluates the ability of Large Language Models to assess the economic validity and consistency of spatial econometrics research, highlighting their strengths and limitations in peer review support.
Contribution
It introduces a novel evaluation framework for LLMs in assessing economic research quality, focusing on spatial econometrics papers and their counterfactual summaries.
Findings
LLMs excel at assessing variable choice coherence (F1 score 0.87).
Performance varies on deeper assessments like coefficient plausibility.
Model choice and paper characteristics significantly affect evaluation accuracy.
Abstract
This paper investigates Large Language Models (LLMs) ability to assess the economic soundness and theoretical consistency of empirical findings in spatial econometrics. We created original and deliberately altered "counterfactual" summaries from 28 published papers (2005-2024), which were evaluated by a diverse set of LLMs. The LLMs provided qualitative assessments and structured binary classifications on variable choice, coefficient plausibility, and publication suitability. The results indicate that while LLMs can expertly assess the coherence of variable choices (with top models like GPT-4o achieving an overall F1 score of 0.87), their performance varies significantly when evaluating deeper aspects such as coefficient plausibility and overall publication suitability. The results further revealed that the choice of LLM, the specific characteristics of the paper and the interaction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Regional Economics and Spatial Analysis · Housing Market and Economics
MethodsSparse Evolutionary Training
