When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives
Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Wenlin Yao,, Hassan Foroosh, Dong Yu, Fei Liu

TL;DR
This paper investigates how large language models reason about sports narratives, emphasizing the importance of accurate information aggregation, and introduces SportsGen for synthesizing game data to evaluate reasoning capabilities.
Contribution
It presents a new method, SportsGen, for generating sports narratives, and provides a comprehensive analysis of LLMs' reasoning performance on complex sports data.
Findings
Most models struggle with accurate score aggregation.
Open-source models like Llama-3 often hallucinate scores.
Narrative complexity affects reasoning effectiveness.
Abstract
Reasoning is most powerful when an LLM accurately aggregates relevant information. We examine the critical role of information aggregation in reasoning by requiring the LLM to analyze sports narratives. To succeed at this task, an LLM must infer points from actions, identify related entities, attribute points accurately to players and teams, and compile key statistics to draw conclusions. We conduct comprehensive experiments with real NBA basketball data and present SportsGen, a new method to synthesize game narratives. By synthesizing data, we can rigorously evaluate LLMs' reasoning capabilities under complex scenarios with varying narrative lengths and density of information. Our findings show that most models, including GPT-4o, often fail to accurately aggregate basketball scores due to frequent scoring patterns. Open-source models like Llama-3 further suffer from significant score…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSports Analytics and Performance · Digital Games and Media · Sports, Gender, and Society
