Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Jiacheng Li; Siliang Tang; Juncheng Li; Jun Xiao; Fei Wu; Shiliang Pu,; Yueting Zhuang

arXiv:2008.04504·cs.CL·August 12, 2020

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, Shiliang Pu,, Yueting Zhuang

PDF

TL;DR

This paper introduces a few-shot visual storytelling model that uses topic adaptation and prototype encoding to improve story generation across diverse topics with limited data, inspired by human storytelling.

Contribution

It proposes a novel topic adaptive meta-learning approach combined with prototype encoding to enhance few-shot generalization in visual storytelling.

Findings

01

Improved BLEU and METEOR scores on few-shot tasks

02

Generated stories are more relevant and expressive

03

Mutual benefit observed from combining topic adaptation and prototype encoding

Abstract

Visual Storytelling~(VIST) is a task to tell a narrative story about a certain topic according to the given photo stream. The existing studies focus on designing complex models, which rely on a huge amount of human-annotated data. However, the annotation of VIST is extremely costly and many topics cannot be covered in the training dataset due to the long-tail topic distribution. In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting. Inspired by the way humans tell a story, we propose a topic adaptive storyteller to model the ability of inter-topic generalization. In practice, we apply the gradient-based meta-learning algorithm on multi-modal seq2seq models to endow the model the ability to adapt quickly from topic to topic. Besides, We further propose a prototype encoding structure to model the ability of intra-topic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence