Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses
Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang-Qian Shi, Yulei Niu,, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu

TL;DR
This paper evaluates the narrative reasoning abilities of large language models using movie synopses and tropes, revealing limitations and proposing methods to improve reasoning accuracy and robustness.
Contribution
It introduces a trope-wise querying approach and an Adversarial Injection method to assess and enhance LLMs' narrative reasoning capabilities.
Findings
LLMs perform poorly on narrative reasoning with movie tropes.
Chain-of-thought prompting can cause hallucinations in narrative content.
Adversarial Injection reveals increased sensitivity of CoT to trope-related text.
Abstract
Large language models (LLMs) equipped with chain-of-thoughts (CoT) prompting have shown significant multi-step reasoning capabilities in factual content like mathematics, commonsense, and logic. However, their performance in narrative reasoning, which demands greater abstraction capabilities, remains unexplored. This study utilizes tropes in movie synopses to assess the abstract reasoning abilities of state-of-the-art LLMs and uncovers their low performance. We introduce a trope-wise querying approach to address these challenges and boost the F1 score by 11.8 points. Moreover, while prior studies suggest that CoT enhances multi-step reasoning, this study shows CoT can cause hallucinations in narrative content, reducing GPT-4's performance. We also introduce an Adversarial Injection method to embed trope-related text tokens into movie synopses without explicit tropes, revealing CoT's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
