Contextualize, Show and Tell: A Neural Visual Storyteller

Diana Gonzalez-Rico; Gibran Fuentes-Pineda

arXiv:1806.00738·cs.CL·June 5, 2018·29 cites

Contextualize, Show and Tell: A Neural Visual Storyteller

Diana Gonzalez-Rico, Gibran Fuentes-Pineda

PDF

Open Access 2 Repos

TL;DR

This paper introduces a neural model that generates coherent short stories from image sequences by extending image description techniques with context-aware LSTM encoders and decoders, achieving competitive results in storytelling benchmarks.

Contribution

It proposes a novel neural architecture that models context across image sequences for storytelling, improving upon previous image description models.

Findings

01

Achieved competitive METEOR scores

02

Received high human ratings in storytelling quality

03

Demonstrated effective context modeling across images

Abstract

We present a neural model for generating short stories from image sequences, which extends the image description model by Vinyals et al. (Vinyals et al., 2015). This extension relies on an encoder LSTM to compute a context vector of each story from the image sequence. This context vector is used as the first state of multiple independent decoder LSTMs, each of which generates the portion of the story corresponding to each image in the sequence by taking the image embedding as the first input. Our model showed competitive results with the METEOR metric and human ratings in the internal track of the Visual Storytelling Challenge 2018.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Artificial Intelligence in Games

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory