MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction
Yizhi Li, Xiaohan Chen, Miao Jiang, Wentao Tang, Gaoang Wang

TL;DR
MovieTeller introduces a tool-augmented, multi-stage framework for generating factually accurate, character-consistent movie synopses by leveraging external face recognition tools and progressive abstraction, without requiring model fine-tuning.
Contribution
It presents a training-free, plug-and-play approach that enhances long-form video summarization by integrating external tools and multi-stage processing for improved coherence and factual grounding.
Findings
Significant improvements in factual accuracy over baselines
Enhanced character consistency in generated summaries
Better narrative coherence in long-form video summaries
Abstract
With the explosive growth of digital entertainment, automated video summarization has become indispensable for applications such as content indexing, personalized recommendation, and efficient media archiving. Automatic synopsis generation for long-form videos, such as movies and TV series, presents a significant challenge for existing Vision-Language Models (VLMs). While proficient at single-image captioning, these general-purpose models often exhibit critical failures in long-duration contexts, primarily a lack of ID-consistent character identification and a fractured narrative coherence. To overcome these limitations, we propose MovieTeller, a novel framework for generating movie synopses via tool-augmented progressive abstraction. Our core contribution is a training-free, tool-augmented, fact-grounded generation process. Instead of requiring costly model fine-tuning, our framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
