Zero-Shot Long-Form Video Understanding through Screenplay
Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi,, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

TL;DR
This paper introduces MM-Screenplayer, a multi-modal system that converts videos into screenplay format for improved long-form video understanding and question-answering, achieving state-of-the-art results in the LOVEU challenge.
Contribution
It presents a novel screenplay-based representation and a 'Look Back' strategy for better comprehension and validation in long-form video QA tasks.
Findings
Achieved 87.5% overall accuracy in the LOVEU challenge.
Attained 68.8% accuracy in breakpoint mode.
Outperformed previous methods in long-form video understanding.
Abstract
The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike previous storytelling methods, we organize video content into scenes as the basic unit, rather than just visually continuous shots. Additionally, we developed a ``Look Back'' strategy to reassess and validate uncertain information, particularly targeting breakpoint mode. MM-Screenplayer achieved highest score in the CVPR'2024 LOng-form VidEo Understanding (LOVEU) Track 1 Challenge, with a global accuracy of 87.5% and a breakpoint accuracy of 68.8%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies
