NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding

Hyeonjeong Ha; Jinjin Ge; Bo Feng; Kaixin Ma; Gargi Chakraborty

arXiv:2601.01095·cs.CV·March 31, 2026

NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding

Hyeonjeong Ha, Jinjin Ge, Bo Feng, Kaixin Ma, Gargi Chakraborty

PDF

TL;DR

NarrativeTrack introduces a new benchmark and framework for evaluating how well multimodal large language models understand and reason about entities in dynamic video narratives, highlighting current limitations.

Contribution

It presents the first benchmark for entity-centric narrative understanding in videos and a structured evaluation framework to measure reasoning complexity.

Findings

01

State-of-the-art models struggle with entity tracking across visual transitions.

02

Models show a trade-off between perceptual grounding and temporal reasoning.

03

Current models often hallucinate entity identities under context shifts.

Abstract

Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains underexplored. True narrative understanding requires grounding who is doing what, when, and where, maintaining coherent entity representations across dynamic visual and temporal contexts. We introduce NarrativeTrack, the first benchmark to evaluate narrative understanding in MLLMs through fine-grained entity-centric reasoning. Unlike existing benchmarks limited to short clips or coarse scene-level semantics, we decompose videos into constituent entities and examine their continuity via a Compositional Reasoning Progression (CRP), a structured evaluation framework that progressively increases narrative complexity across three dimensions: entity existence, entity changes, and entity ambiguity. CRP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.