Causal Video Summarizer for Video Exploration
Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel, Worring

TL;DR
This paper introduces Causal Video Summarizer (CVS), a causality-based approach that enhances multi-modal video summarization by modeling interactions between videos and text queries, leading to improved accuracy and F1 scores.
Contribution
The paper presents a novel causality-based method for multi-modal video summarization, effectively capturing interactions between video and text inputs, outperforming existing methods.
Findings
+5.4% accuracy improvement
+4.92% F1 score increase
Effective modeling of video-query interactions
Abstract
Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Multimedia Communication and Technology
