Object Relational Graph with Teacher-Recommended Learning for Video   Captioning

Ziqi Zhang; Yaya Shi; Chunfeng Yuan; Bing Li; Peijin Wang; Weiming Hu,; Zhengjun Zha

arXiv:2002.11566·cs.CV·February 27, 2020·39 cites

Object Relational Graph with Teacher-Recommended Learning for Video Captioning

Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu,, Zhengjun Zha

PDF

Open Access 1 Video

TL;DR

This paper introduces an object relational graph encoder and a teacher-recommended learning strategy that leverage external language models to improve video captioning by enriching visual features and addressing long-tailed word distributions.

Contribution

The paper presents a novel object relational graph encoder and a teacher-recommended learning method that effectively incorporate external language knowledge into video captioning models.

Findings

01

Achieves state-of-the-art results on MSVD, MSR-VTT, and VATEX datasets.

02

Enriches visual representation with object interaction features.

03

Improves handling of long-tailed word distributions in captioning.

Abstract

Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the ground-truth words used for training to deal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Object Relational Graph With Teacher-Recommended Learning for Video Captioning· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization