Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Yunjae Jung; Dahun Kim; Sanghyun Woo; Kyungsu Kim; Sungjin Kim; In So; Kweon

arXiv:2002.00774·cs.CV·February 4, 2020·1 cites

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So, Kweon

PDF

Open Access

TL;DR

This paper introduces a novel hide-and-tell model for visual storytelling that learns to bridge gaps in photo streams by imagining plausible storylines, outperforming previous methods in automatic metrics.

Contribution

It proposes a new training scheme and model architecture that explicitly learn to imagine and interpolate storylines across missing or visual gaps in photo streams.

Findings

01

Outperforms previous state-of-the-art in automatic metrics.

02

Effectively interpolates storyline over visual gaps.

03

Demonstrates ability to generate human-like narration with missing photos.

Abstract

Visual storytelling is a task of creating a short story based on photo streams. Unlike existing visual captioning, storytelling aims to contain not only factual descriptions, but also human-like narration and semantics. However, the VIST dataset consists only of a small, fixed number of photos per story. Therefore, the main challenge of visual storytelling is to fill in the visual gap between photos with narrative and imaginative story. In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap. During training, one or more photos is randomly omitted from the input stack, and we train the network to produce a full plausible story even with missing photo(s). Furthermore, we propose for visual storytelling a hide-and-tell model, which is designed to learn non-local relations across the photo streams and to refine and improve conventional RNN-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization