Discriminative Latent Semantic Graph for Video Captioning

Yang Bai; Junyan Wang; Yang Long; Bingzhang Hu; Yang Song; Maurice; Pagnucco; Yu Guan

arXiv:2108.03662·cs.CV·August 11, 2021

Discriminative Latent Semantic Graph for Video Captioning

Yang Bai, Junyan Wang, Yang Long, Bingzhang Hu, Yang Song, Maurice, Pagnucco, Yu Guan

PDF

1 Repo

TL;DR

This paper introduces a novel framework for video captioning that enhances object proposals, extracts high-level visual semantics, and validates generated sentences, leading to significant improvements over existing methods.

Contribution

It proposes a joint framework with a Conditional Graph, Latent Proposal Aggregation, and a Discriminative Language Validator for improved video captioning.

Findings

01

Significant improvements on MVSD and MSR-VTT datasets.

02

Enhanced BLEU-4 and CIDEr scores.

03

Effective preservation of key semantic concepts.

Abstract

Video captioning aims to automatically generate natural language sentences that can describe the visual contents of a given video. Existing generative models like encoder-decoder frameworks cannot explicitly explore the object-level interactions and frame-level information from complex spatio-temporal data to generate semantic-rich captions. Our main contribution is to identify three key problems in a joint framework for future video summarization tasks. 1) Enhanced Object Proposal: we propose a novel Conditional Graph that can fuse spatio-temporal information into latent object proposal. 2) Visual Knowledge: Latent Proposal Aggregation is proposed to dynamically extract visual words with higher semantic levels. 3) Sentence Validation: A novel Discriminative Language Validator is proposed to verify generated captions so that key semantic concepts can be effectively preserved. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baiyang4/d-lsg-video-caption
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.