Recurrent Topic-Transition GAN for Visual Paragraph Generation
Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

TL;DR
This paper introduces RTT-GAN, a semi-supervised generative model that produces diverse, coherent visual paragraphs by reasoning over semantic regions and using adversarial training to ensure plausibility and topic coherence.
Contribution
It proposes a novel recurrent topic-transition GAN framework that integrates region-based attention and multi-level discriminators for semantically coherent paragraph generation.
Findings
RTT-GAN outperforms existing methods in generating realistic paragraphs.
The model effectively captures topic transitions and semantic diversity.
Qualitative results show interpretability and storytelling capability.
Abstract
A natural image usually conveys rich semantic content and can be viewed from different angles. Existing image description methods are largely restricted by small sets of biased visual paragraph annotations, and fail to cover rich underlying semantics. In this paper, we investigate a semi-supervised paragraph generative framework that is able to synthesize diverse and semantically coherent paragraph descriptions by reasoning over local semantic regions and exploiting linguistic knowledge. The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators. The paragraph generator generates sentences recurrently by incorporating region-based visual and language attention mechanisms at each step. The quality of generated paragraph sentences is assessed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
