LLM Predictive Scoring and Validation: Inferring Experience Ratings from Unstructured Text
Jason Potteiger, Andrew Hong, Ito Zapata

TL;DR
This study demonstrates that GPT-4.1 can predict baseball fans' experience ratings from their open-ended responses with high accuracy, capturing overall sentiment and specific experience aspects.
Contribution
It shows that a straightforward prompt enables reliable prediction of fan ratings from unstructured text, revealing insights into evaluative judgment versus explicit ratings.
Findings
67% of predictions within +/-1 point of actual ratings
Predicted ratings align strongly with overall experience (r=0.82)
Predictions systematically lower than self-reports by about one point
Abstract
We tasked GPT-4.1 to read what baseball fans wrote about their game-day experience and predict the overall experience rating each fan gave on a 0-10 survey scale. The model received only the text of a single open-ended response. These AI predictions were compared with the actual experience ratings captured by the survey instrument across approximately 10,000 fan responses from five Major League Baseball teams. In total two-thirds of predicted ratings fell within one point of self-reported fan ratings (67% within +/-1, 36% exact match), and the predicted measurement was near-deterministic across three independent scoring runs (87% exact agreement, 99.9% within +/-1). Predicted ratings aligned most strongly with the overall experience rating (r = 0.82) rather than with any specific aspect of the game-day experience such as parking, concessions, staff, etc. However, predictions were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
