StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation

Daniel A. P. Oliveira; David Martins de Matos

arXiv:2505.10292·cs.CV·September 3, 2025

StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation

Daniel A. P. Oliveira, David Martins de Matos

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces the StoryReasoning dataset and a novel approach for grounded visual storytelling that maintains character consistency and reduces hallucinations through chain-of-thought reasoning and cross-frame visual re-identification.

Contribution

It presents a new dataset with structured scene analysis and grounded stories, and a fine-tuned model that improves scene understanding and story coherence.

Findings

01

Reduced hallucinations by 12.3% on average.

02

Improved creativity scores by 31%.

03

Demonstrated effective cross-frame object re-identification.

Abstract

Visual storytelling systems struggle to maintain character identity across frames and link actions to appropriate subjects, frequently leading to referential hallucinations. These issues can be addressed through grounding of characters, objects, and other entities on the visual elements. We propose StoryReasoning, a dataset containing 4,178 stories derived from 52,016 movie images, with both structured scene analyses and grounded stories. Each story maintains character and object consistency across frames while explicitly modeling multi-frame relationships through structured tabular representations. Our approach features cross-frame object re-identification using visual similarity and face recognition, chain-of-thought reasoning for explicit narrative modeling, and a grounding scheme that links textual elements to visual entities across multiple frames. We establish baseline performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daniel3303/storyreasoning
pytorchOfficial

Models

Datasets

daniel3303/StoryReasoning
dataset· 213 dl
213 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods