Contextually Customized Video Summaries via Natural Language

Jinsoo Choi; Tae-Hyun Oh; In So Kweon

arXiv:1702.01528·cs.CV·March 5, 2018·1 cites

Contextually Customized Video Summaries via Natural Language

Jinsoo Choi, Tae-Hyun Oh, In So Kweon

PDF

Open Access

TL;DR

This paper presents a method for generating personalized video summaries based on simple text descriptions by learning semantic embeddings and selecting relevant segments, outperforming some baseline methods.

Contribution

We introduce a novel approach to create customized video summaries from text, leveraging semantic embeddings learned through a deep architecture trained on image-caption data.

Findings

01

Our method produces semantically relevant video summaries based on user text.

02

It achieves comparable or better performance than baseline methods.

03

The approach generates diverse summaries using learned visual embeddings.

Abstract

The best summary of a long video differs among different people due to its highly subjective nature. Even for the same person, the best summary may change with time or mood. In this paper, we introduce the task of generating customized video summaries through simple text. First, we train a deep architecture to effectively learn semantic embeddings of video frames by leveraging the abundance of image-caption data via a progressive and residual manner. Given a user-specific text description, our algorithm is able to select semantically relevant video segments and produce a temporally aligned video summary. In order to evaluate our textually customized video summaries, we conduct experimental comparison with baseline methods that utilize ground-truth information. Despite the challenging baselines, our method still manages to show comparable or even exceeding performance. We also show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques