Movie2Story: A framework for understanding videos and telling stories in   the form of novel text

Kangning Li; Zheyang Jia; Anyu Ying

arXiv:2412.14965·cs.CV·January 14, 2025

Movie2Story: A framework for understanding videos and telling stories in the form of novel text

Kangning Li, Zheyang Jia, Anyu Ying

PDF

Open Access

TL;DR

This paper introduces MSBench, a new benchmark for evaluating multi-modal story generation from videos with rich auxiliary data, revealing current models' limitations and proposing improvements.

Contribution

It presents a novel benchmark and dataset generation method for assessing multi-modal story generation, along with a new model architecture to enhance performance.

Findings

01

Current models perform poorly on the benchmark.

02

Automated dataset creation reduces manual effort.

03

Proposed model shows improved results on MSBench.

Abstract

In recent years, large-scale models have achieved significant advancements, accompanied by the emergence of numerous high-quality benchmarks for evaluating various aspects of their comprehension abilities. However, most existing benchmarks primarily focus on spatial understanding in static image tasks. While some benchmarks extend evaluations to temporal tasks, they fall short in assessing text generation under complex contexts involving long videos and rich auxiliary information. To address this limitation, we propose a novel benchmark: the Multi-modal Story Generation Benchmark (MSBench), designed to evaluate text generation capabilities in scenarios enriched with auxiliary information. Our work introduces an innovative automatic dataset generation method to ensure the availability of accurate auxiliary information. On one hand, we leverage existing datasets and apply automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization

MethodsFocus