Frame- and Segment-Level Features and Candidate Pool Evaluation for   Video Caption Generation

Rakshith Shetty; Jorma Laaksonen

arXiv:1608.04959·cs.CV·August 18, 2016

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

Rakshith Shetty, Jorma Laaksonen

PDF

1 Repo

TL;DR

This paper introduces a video captioning system that combines multiple feature types and a candidate evaluation model to generate more accurate and diverse video descriptions, achieving top rankings in a challenge.

Contribution

The authors propose a multi-feature, multi-model approach with a candidate pool evaluator for improved video captioning performance.

Findings

01

Rated best in human evaluation in the MSR Video to Language Challenge.

02

Ranked second in automatic evaluation metrics.

03

Effective use of diverse features and candidate selection improves caption quality.

Abstract

We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset. Our model is based on the encoder--decoder pipeline, popular in image and video captioning systems. We propose to utilize two different kinds of video features, one to capture the video content in terms of objects and attributes, and the other to capture the motion and action information. Using these diverse features we train models specializing in two separate input sub-domains. We then train an evaluator model which is used to pick the best caption from the pool of candidates generated by these domain expert models. We argue that this approach is better suited for the current video captioning task, compared to using a single model, due to the diversity in the dataset. Efficacy of our method is proven by the fact that it was rated best in MSR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rakshithShetty/captionGAN
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.