Bypass Network for Semantics Driven Image Paragraph Captioning

Qi Zheng; Chaoyue Wang; Dadong Wang

arXiv:2206.10059·cs.CV·June 22, 2022

Bypass Network for Semantics Driven Image Paragraph Captioning

Qi Zheng, Chaoyue Wang, Dadong Wang

PDF

Open Access

TL;DR

This paper introduces a bypass network that separately models semantics and syntax to improve coherence and reduce repetition in image paragraph captioning, achieving superior results on benchmark datasets.

Contribution

The proposed model separates semantics and syntax modeling with a bypass network, enhancing coherence and reducing repetition in image paragraph captioning.

Findings

01

Outperforms state-of-the-art methods on benchmark datasets.

02

Effectively reduces both immediate and delayed repetitions.

03

Achieves higher coherence without sacrificing accuracy.

Abstract

Image paragraph captioning aims to describe a given image with a sequence of coherent sentences. Most existing methods model the coherence through the topic transition that dynamically infers a topic vector from preceding sentences. However, these methods still suffer from immediate or delayed repetitions in generated paragraphs because (i) the entanglement of syntax and semantics distracts the topic vector from attending pertinent visual regions; (ii) there are few constraints or rewards for learning long-range transitions. In this paper, we propose a bypass network that separately models semantics and linguistic syntax of preceding sentences. Specifically, the proposed model consists of two main modules, i.e. a topic transition module and a sentence generation module. The former takes previous semantic vectors as queries and applies attention mechanism on regional features to acquire…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsREINFORCE