Video Summarization: Towards Entity-Aware Captions

Hammad A. Ayyubi; Tianqi Liu; Arsha Nagrani; Xudong Lin; Mingda Zhang,; Anurag Arnab; Feng Han; Yukun Zhu; Jialu Liu; Shih-Fu Chang

arXiv:2312.02188·cs.CV·November 12, 2024·2 cites

Video Summarization: Towards Entity-Aware Captions

Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang,, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces the task of generating entity-aware captions for news videos, presents a large-scale dataset, and proposes a method that combines visual data with external knowledge to improve captioning accuracy.

Contribution

It defines a new task of entity-aware news video captioning, releases the VIEWS dataset, and proposes a knowledge-augmented captioning method that enhances existing models.

Findings

01

The proposed method improves caption quality on news videos.

02

The approach generalizes to news image caption datasets.

03

Extensive experiments validate the effectiveness of the method.

Abstract

Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities for meaningful summarization. As such, we propose the task of summarizing news video directly to entity-aware captions. We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task. Further, we propose a method that augments visual information from videos with context retrieved from external world knowledge to generate entity-aware captions. We demonstrate the effectiveness of our approach on three video captioning models. We also show that our approach generalizes to existing news image captions dataset. With all the extensive experiments and insights, we believe we establish a solid basis for future research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hayyubi/views
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Natural Language Processing Techniques