Recurrent Topic-Transition GAN for Visual Paragraph Generation

Xiaodan Liang; Zhiting Hu; Hao Zhang; Chuang Gan; Eric P. Xing

arXiv:1703.07022·cs.CV·March 27, 2017·77 cites

Recurrent Topic-Transition GAN for Visual Paragraph Generation

Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

PDF

Open Access

TL;DR

This paper introduces RTT-GAN, a semi-supervised generative model that produces diverse, coherent visual paragraphs by reasoning over semantic regions and using adversarial training to ensure plausibility and topic coherence.

Contribution

It proposes a novel recurrent topic-transition GAN framework that integrates region-based attention and multi-level discriminators for semantically coherent paragraph generation.

Findings

01

RTT-GAN outperforms existing methods in generating realistic paragraphs.

02

The model effectively captures topic transitions and semantic diversity.

03

Qualitative results show interpretability and storytelling capability.

Abstract

A natural image usually conveys rich semantic content and can be viewed from different angles. Existing image description methods are largely restricted by small sets of biased visual paragraph annotations, and fail to cover rich underlying semantics. In this paper, we investigate a semi-supervised paragraph generative framework that is able to synthesize diverse and semantically coherent paragraph descriptions by reasoning over local semantic regions and exploiting linguistic knowledge. The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators. The paragraph generator generates sentences recurrently by incorporating region-based visual and language attention mechanisms at each step. The quality of generated paragraph sentences is assessed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques