A Hierarchical Approach for Visual Storytelling Using Image Description

Md Sultan Al Nahian; Tasmia Tasrin; Sagar Gandhi; Ryan Gaines; and; Brent Harrison

arXiv:1909.12401·cs.CV·September 30, 2019

A Hierarchical Approach for Visual Storytelling Using Image Description

Md Sultan Al Nahian, Tasmia Tasrin, Sagar Gandhi, Ryan Gaines, and, Brent Harrison

PDF

Open Access

TL;DR

This paper introduces a hierarchical deep learning model that uses image descriptions and images to generate coherent, long visual stories, outperforming existing methods on the VIST dataset.

Contribution

A novel hierarchical encoder-decoder architecture incorporating image descriptions to improve long-term context and diversity in visual storytelling.

Findings

01

Outperforms state-of-the-art on VIST dataset

02

Demonstrates importance of hierarchical structure

03

Validates effectiveness of image descriptions in storytelling

Abstract

One of the primary challenges of visual storytelling is developing techniques that can maintain the context of the story over long event sequences to generate human-like stories. In this paper, we propose a hierarchical deep learning architecture based on encoder-decoder networks to address this problem. To better help our network maintain this context while also generating long and diverse sentences, we incorporate natural language image descriptions along with the images themselves to generate each story sentence. We evaluate our system on the Visual Storytelling (VIST) dataset and show that our method outperforms state-of-the-art techniques on a suite of different automatic evaluation metrics. The empirical results from this evaluation demonstrate the necessities of different components of our proposed architecture and shows the effectiveness of the architecture for visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques