Incorporating Textual Evidence in Visual Storytelling

Tianyi Li; Sujian Li

arXiv:1911.09334·cs.CL·November 26, 2019·1 cites

Incorporating Textual Evidence in Visual Storytelling

Tianyi Li, Sujian Li

PDF

Open Access

TL;DR

This paper introduces a novel approach to visual storytelling that incorporates textual evidence from similar images using a two-step ranking and an extended Seq2Seq model, improving story coherence and quality.

Contribution

It presents a new method combining image ranking and a two-channel encoder with attention in Seq2Seq for enhanced visual storytelling.

Findings

01

Outperforms state-of-the-art models on VIST dataset

02

Utilizes textual evidence to improve story coherence

03

Employs a two-step image ranking method

Abstract

Previous work on visual storytelling mainly focused on exploring image sequence as evidence for storytelling and neglected textual evidence for guiding story generation. Motivated by human storytelling process which recalls stories for familiar images, we exploit textual evidence from similar images to help generate coherent and meaningful stories. To pick the images which may provide textual experience, we propose a two-step ranking method based on image object recognition techniques. To utilize textual information, we design an extended Seq2Seq model with two-channel encoder and attention. Experiments on the VIST dataset show that our method outperforms state-of-the-art baseline models without heavy engineering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence