MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction

Yizhi Li; Xiaohan Chen; Miao Jiang; Wentao Tang; Gaoang Wang

arXiv:2602.23228·cs.CV·March 16, 2026

MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction

Yizhi Li, Xiaohan Chen, Miao Jiang, Wentao Tang, Gaoang Wang

PDF

Open Access

TL;DR

MovieTeller introduces a tool-augmented, multi-stage framework for generating factually accurate, character-consistent movie synopses by leveraging external face recognition tools and progressive abstraction, without requiring model fine-tuning.

Contribution

It presents a training-free, plug-and-play approach that enhances long-form video summarization by integrating external tools and multi-stage processing for improved coherence and factual grounding.

Findings

01

Significant improvements in factual accuracy over baselines

02

Enhanced character consistency in generated summaries

03

Better narrative coherence in long-form video summaries

Abstract

With the explosive growth of digital entertainment, automated video summarization has become indispensable for applications such as content indexing, personalized recommendation, and efficient media archiving. Automatic synopsis generation for long-form videos, such as movies and TV series, presents a significant challenge for existing Vision-Language Models (VLMs). While proficient at single-image captioning, these general-purpose models often exhibit critical failures in long-duration contexts, primarily a lack of ID-consistent character identification and a fractured narrative coherence. To overcome these limitations, we propose MovieTeller, a novel framework for generating movie synopses via tool-augmented progressive abstraction. Our core contribution is a training-free, tool-augmented, fact-grounded generation process. Instead of requiring costly model fine-tuning, our framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques