Causal Video Summarizer for Video Exploration

Jia-Hong Huang; Chao-Han Huck Yang; Pin-Yu Chen; Andrew Brown; Marcel; Worring

arXiv:2307.01947·cs.CV·July 6, 2023

Causal Video Summarizer for Video Exploration

Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel, Worring

PDF

Open Access

TL;DR

This paper introduces Causal Video Summarizer (CVS), a causality-based approach that enhances multi-modal video summarization by modeling interactions between videos and text queries, leading to improved accuracy and F1 scores.

Contribution

The paper presents a novel causality-based method for multi-modal video summarization, effectively capturing interactions between video and text inputs, outperforming existing methods.

Findings

01

+5.4% accuracy improvement

02

+4.92% F1 score increase

03

Effective modeling of video-query interactions

Abstract

Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Music and Audio Processing · Multimedia Communication and Technology