Translating Videos to Natural Language Using Deep Recurrent Neural   Networks

Subhashini Venugopalan; Huijuan Xu; Jeff Donahue; Marcus Rohrbach,; Raymond Mooney; Kate Saenko

arXiv:1412.4729·cs.CV·May 1, 2015

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach,, Raymond Mooney, Kate Saenko

PDF

1 Repo

TL;DR

This paper introduces a deep neural network model that translates videos into natural language sentences, leveraging large-scale image datasets to improve open-domain video captioning.

Contribution

It presents a unified convolutional and recurrent neural network that transfers knowledge from image datasets to generate video descriptions with large vocabularies.

Findings

01

Outperforms existing methods on language generation metrics

02

Achieves higher accuracy in subject, verb, and object prediction

03

Receives positive human evaluation results

Abstract

Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nasib-ullah/video-captioning-models-in-Pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory