Keep it Consistent: Topic-Aware Storytelling from an Image Stream via   Iterative Multi-agent Communication

Ruize Wang; Zhongyu Wei; Ying Cheng; Piji Li; Haijun Shan; Ji Zhang,; Qi Zhang; Xuanjing Huang

arXiv:1911.04192·cs.CL·November 2, 2020·5 cites

Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Ruize Wang, Zhongyu Wei, Ying Cheng, Piji Li, Haijun Shan, Ji Zhang,, Qi Zhang, Xuanjing Huang

PDF

Open Access

TL;DR

This paper introduces a topic-aware multi-agent framework for visual storytelling that improves narrative coherence by jointly generating a global topic description and a story from an image stream.

Contribution

It proposes a novel multi-agent communication approach that integrates topic detection and story generation, enhancing semantic consistency in visual storytelling.

Findings

01

Outperforms state-of-the-art methods on VIST dataset

02

Produces more coherent and semantically relevant stories

03

Validated by quantitative, ablation, and human evaluations

Abstract

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method's good…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques