Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

Bowen Zhang; Hexiang Hu; Fei Sha

arXiv:2001.04541·cs.CV·January 15, 2020·5 cites

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

Bowen Zhang, Hexiang Hu, Fei Sha

PDF

Open Access

TL;DR

This paper introduces a simple, effective model for visual storytelling that predicts anchor word embeddings from images and uses them with image features to generate narratives, outperforming existing methods in automatic and human evaluations.

Contribution

The paper presents a novel approach that predicts anchor word embeddings from images to improve visual storytelling, offering a simpler and more effective model than previous state-of-the-art methods.

Findings

01

Achieves top results in automatic evaluation metrics.

02

Outperforms competing methods in human evaluations.

03

Model is simple, easy to optimize, and effective.

Abstract

We propose a learning model for the task of visual storytelling. The main idea is to predict anchor word embeddings from the images and use the embeddings and the image features jointly to generate narrative sentences. We use the embeddings of randomly sampled nouns from the groundtruth stories as the target anchor word embeddings to learn the predictor. To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model. As opposed to state-of-the-art methods, the proposed model is simple in design, easy to optimize, and attains the best results in most automatic evaluation metrics. In human evaluation, the method also outperforms competing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Artificial Intelligence in Games

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence