A Semantics-Assisted Video Captioning Model Trained with Scheduled   Sampling

Haoran Chen; Ke Lin; Alexander Maye; Jianming Li; Xiaolin Hu

arXiv:1909.00121·cs.CV·February 15, 2021

A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling

Haoran Chen, Ke Lin, Alexander Maye, Jianming Li, Xiaolin Hu

PDF

2 Repos

TL;DR

This paper introduces a video captioning model that enhances semantic feature extraction, employs scheduled sampling for better training, and adjusts loss functions to generate more comprehensive captions, leading to improved performance on standard datasets.

Contribution

The paper presents a novel semantic detection network, applies scheduled sampling for training, and modifies the loss function to produce longer, more accurate video captions.

Findings

01

Outperforms previous models on YouTube2Text dataset.

02

Achieves competitive results on MSR-VTT dataset.

03

Improves semantic feature quality and caption length.

Abstract

Given the features of a video, recurrent neural networks can be used to automatically generate a caption for the video. Existing methods for video captioning have at least three limitations. First, semantic information has been widely applied to boost the performance of video captioning models, but existing networks often fail to provide meaningful semantic features. Second, the Teacher Forcing algorithm is often utilized to optimize video captioning models, but during training and inference, different strategies are applied to guide word generation, leading to poor performance. Third, current video captioning models are prone to generate relatively short captions that express video contents inappropriately. Toward resolving these three problems, we suggest three corresponding improvements. First of all, we propose a metric to compare the quality of semantic features, and utilize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.