Syntax Customized Video Captioning by Imitating Exemplar Sentences

Yitian Yuan; Lin Ma; Wenwu Zhu

arXiv:2112.01062·cs.CV·December 3, 2021

Syntax Customized Video Captioning by Imitating Exemplar Sentences

Yitian Yuan, Lin Ma, Wenwu Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel task of syntax customized video captioning, enabling generation of semantically accurate and syntactically varied captions by imitating exemplar sentence structures, enhancing diversity in video descriptions.

Contribution

The paper proposes a new model with a hierarchical syntax encoder and syntax-conditioned decoder, along with a training strategy leveraging traditional caption data and exemplar sentences.

Findings

01

Generated captions are syntactically diverse and semantically coherent.

02

The model outperforms baselines in diversity and fluency evaluations.

03

Extensive experiments validate the effectiveness of syntax imitation in video captioning.

Abstract

Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research. In this paper, we explore this problem from a novel perspective of customizing video captions by imitating exemplar sentence syntaxes. Specifically, given a video and any syntax-valid exemplar sentence, we introduce a new task of Syntax Customized Video Captioning (SCVC) aiming to generate one caption which not only semantically describes the video contents but also syntactically imitates the given exemplar sentence. To tackle the SCVC task, we propose a novel video captioning model, where a hierarchical sentence syntax encoder is firstly designed to extract the syntactic structure of the exemplar sentence, then a syntax conditioned caption decoder is devised to generate the syntactically structured caption expressing video semantics. As there is no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yytzsy/syntax-customized-video-captioning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition