StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
Daniel Oliveira, David Martins de Matos

TL;DR
This paper introduces StoryMovie, a dataset aligning visual stories with movie scripts and subtitles, enabling improved semantic grounding and dialogue attribution in visual storytelling models.
Contribution
The creation of StoryMovie dataset with aligned scripts and subtitles, and the development of Storyteller3 fine-tuned on this dataset for enhanced semantic grounding.
Findings
Storyteller3 outperforms base models in subtitle alignment accuracy.
Semantic alignment improves dialogue attribution beyond visual grounding.
Dataset enables more authentic character and relationship modeling.
Abstract
Visual storytelling models that correctly ground entities in images may still hallucinate semantic relationships, generating incorrect dialogue attribution, character interactions, or emotional states. We introduce StoryMovie, a dataset of 1,757 stories aligned with movie scripts and subtitles through LCS matching. Our alignment pipeline synchronizes screenplay dialogue with subtitle timestamps, enabling dialogue attribution by linking character names from scripts to temporal positions from subtitles. Using this aligned content, we generate stories that maintain visual grounding tags while incorporating authentic character names, dialogue, and relationship dynamics. We fine-tune Qwen Storyteller3 on this dataset, building on prior work in visual grounding and entity re-identification. Evaluation using DeepSeek V3 as judge shows that Storyteller3 achieves an 89.9% win rate against base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
