No Metrics Are Perfect: Adversarial Reward Learning for Visual   Storytelling

Xin Wang; Wenhu Chen; Yuan-Fang Wang; William Yang Wang

arXiv:1804.09160·cs.CL·July 10, 2018·23 cites

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces an adversarial reward learning framework for visual storytelling that learns from human demonstrations, resulting in more human-like stories despite only slight improvements in automatic metrics.

Contribution

It proposes a novel AREL framework that learns implicit rewards from human data, addressing limitations of existing reinforcement learning approaches in visual storytelling.

Findings

01

Achieves better human-like storytelling quality than SOTA methods.

02

Slight improvements in automatic evaluation metrics.

03

Human evaluation confirms significant qualitative enhancement.

Abstract

Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem. Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images. Thus it poses challenges to behavioral cloning algorithms. Furthermore, due to the limitations of automatic metrics on evaluating story quality, reinforcement learning methods with hand-crafted rewards also face difficulties in gaining an overall performance boost. Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function. Though automatic eval- uation indicates slight performance boost over state-of-the-art (SOTA) methods in cloning expert behaviors,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning