Visual Subtitle Feature Enhanced Video Outline Generation

Qi Lv; Ziqiang Cao; Wenrui Xie; Derui Wang; Jingwen Wang; Zhiwei Hu,; Tangkun Zhang; Ba Yuan; Yuanhang Li; Min Cao; Wenjie Li; Sujian Li; Guohong; Fu

arXiv:2208.11307·cs.CV·September 2, 2022

Visual Subtitle Feature Enhanced Video Outline Generation

Qi Lv, Ziqiang Cao, Wenrui Xie, Derui Wang, Jingwen Wang, Zhiwei Hu,, Tangkun Zhang, Ba Yuan, Yuanhang Li, Min Cao, Wenjie Li, Sujian Li, Guohong, Fu

PDF

Open Access

TL;DR

This paper introduces a novel task called video outline generation (VOG), which segments videos and generates headings for each segment using visual subtitle features, supported by a new dataset and a specialized model.

Contribution

The paper proposes VSENet, a new model that incorporates visual subtitle features for improved video outline generation, and provides the DuVOG dataset for training and evaluation.

Findings

01

VSENet achieves 77.1 F1-score in video segmentation

02

VSENet attains 85.0 ROUGE-L_F0.5 in headline generation

03

The model outperforms baseline methods significantly

Abstract

With the tremendously increasing number of videos, there is a great demand for techniques that help people quickly navigate to the video segments they are interested in. However, current works on video understanding mainly focus on video content summarization, while little effort has been made to explore the structure of a video. Inspired by textual outline generation, we introduce a novel video understanding task, namely video outline generation (VOG). This task is defined to contain two sub-tasks: (1) first segmenting the video according to the content structure and then (2) generating a heading for each segment. To learn and evaluate VOG, we annotate a 10k+ dataset, called DuVOG. Specifically, we use OCR tools to recognize subtitles of videos. Then annotators are asked to divide subtitles into chapters and title each chapter. In videos, highlighted text tends to be the headline since…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Natural Language Processing Techniques