Enriching Video Captions With Contextual Text

Philipp Rimle; Pelin Dogan; Markus Gross

arXiv:2007.14682·cs.CV·July 30, 2020

Enriching Video Captions With Contextual Text

Philipp Rimle, Pelin Dogan, Markus Gross

PDF

2 Repos

TL;DR

This paper introduces an end-to-end model that enhances video captioning by integrating relevant contextual text, enabling more specific and informative descriptions through a pointer-generator mechanism.

Contribution

The novel contribution is an architecture that directly learns to attend over unprocessed contextual text, improving caption specificity without additional preprocessing.

Findings

01

Achieves competitive results on the News Video Dataset.

02

Validates the effectiveness of contextual information in video captioning.

03

Demonstrates the benefit of pointer-generator networks for copying relevant words.

Abstract

Understanding video content and generating caption with context is an important and challenging task. Unlike prior methods that typically attempt to generate generic video captions without context, our architecture contextualizes captioning by infusing extracted information from relevant text data. We propose an end-to-end sequence-to-sequence model which generates video captions based on visual input, and mines relevant knowledge such as names and locations from contextual text. In contrast to previous approaches, we do not preprocess the text further, and let the model directly learn to attend over it. Guided by the visual input, the model is able to copy words from the contextual text via a pointer-generator network, allowing to produce more specific video captions. We show competitive performance on the News Video Dataset and, through ablation studies, validate the efficacy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.