RoViST:Learning Robust Metrics for Visual Storytelling

Eileen Wang; Caren Han; Josiah Poon

arXiv:2205.03774·cs.CV·May 10, 2022·1 cites

RoViST:Learning Robust Metrics for Visual Storytelling

Eileen Wang, Caren Han, Josiah Poon

PDF

Open Access 1 Repo

TL;DR

This paper introduces three new evaluation metrics for visual storytelling that better align with human judgment by analyzing visual grounding, coherence, and non-redundancy, addressing limitations of traditional n-gram based metrics.

Contribution

It proposes a novel set of learning-based evaluation metrics for visual storytelling that outperform existing metrics in correlating with human judgments.

Findings

01

Metrics outperform others in human correlation

02

Metrics analyze visual grounding, coherence, non-redundancy

03

Applicable to models trained on VIST dataset

Abstract

Visual storytelling (VST) is the task of generating a story paragraph that describes a given image sequence. Most existing storytelling approaches have evaluated their models using traditional natural language generation metrics like BLEU or CIDEr. However, such metrics based on n-gram matching tend to have poor correlation with human evaluation scores and do not explicitly consider other criteria necessary for storytelling such as sentence structure or topic coherence. Moreover, a single score is not enough to assess a story as it does not inform us about what specific errors were made by the model. In this paper, we propose 3 evaluation metrics sets that analyses which aspects we would look for in a good story: 1) visual grounding, 2) coherence, and 3) non-redundancy. We measure the reliability of our metric sets by analysing its correlation with human judgement scores on a sample of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usydnlp/rovist
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Digital Storytelling and Education