Ordered Attention for Coherent Visual Storytelling

Tom Braude; Idan Schwartz; Alexander Schwing; Ariel Shamir

arXiv:2108.02180·cs.CV·November 10, 2022

Ordered Attention for Coherent Visual Storytelling

Tom Braude, Idan Schwartz, Alexander Schwing, Ariel Shamir

PDF

TL;DR

This paper introduces ordered image attention (OIA) and image-sentence attention (ISA) mechanisms to generate more coherent, focused, and image-grounded stories from image sequences, improving METEOR scores and human judgments.

Contribution

It proposes novel ordered attention mechanisms and an adaptive prior to enhance visual storytelling coherence and reduce linguistic errors.

Findings

01

METEOR score improved by 1% on VIST dataset

02

Human study shows increased story coherency and focus

03

Generated stories are more shareable and image-grounded

Abstract

We address the problem of visual storytelling, i.e., generating a story for a given sequence of images. While each sentence of the story should describe a corresponding image, a coherent story also needs to be consistent and relate to both future and past images. To achieve this we develop ordered image attention (OIA). OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence. To highlight the important objects, a message-passing-like algorithm collects representations of those objects in an order-aware manner. To generate the story's sentences, we then highlight important image attention vectors with an Image-Sentence Attention (ISA). Further, to alleviate common linguistic mistakes like repetitiveness, we introduce an adaptive prior. The obtained results improve the METEOR score on the VIST dataset by 1%. In addition, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.