GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story   Generation

Taehyeong Kim; Min-Oh Heo; Seonil Son; Kyoung-Wha Park; Byoung-Tak; Zhang

arXiv:1805.10973·cs.CL·February 14, 2019·53 cites

GLAC Net: GLocal Attention Cascading Networks for Multi-image Cued Story Generation

Taehyeong Kim, Min-Oh Heo, Seonil Son, Kyoung-Wha Park, Byoung-Tak, Zhang

PDF

Open Access 1 Repo

TL;DR

GLAC Net is a deep learning model that enhances multi-image story generation by combining global-local attention and cascading context mechanisms, achieving competitive results on the VIST dataset.

Contribution

It introduces a novel GLocal attention and cascading framework for visual storytelling, simplifying parameter use and improving story coherence.

Findings

01

Achieves competitive performance on VIST dataset

02

Utilizes a simple yet effective attention mechanism

03

Improves story coherence through cascading information

Abstract

The task of multi-image cued story generation, such as visual storytelling dataset (VIST) challenge, is to compose multiple coherent sentences from a given sequence of images. The main difficulty is how to generate image-specific sentences within the context of overall images. Here we propose a deep learning network model, GLAC Net, that generates visual stories by combining global-local (glocal) attention and context cascading mechanisms. The model incorporates two levels of attention, i.e., overall encoding level and image feature level, to construct image-dependent sentences. While standard attention configuration needs a large number of parameters, the GLAC Net implements them in a very simple way via hard connections from the outputs of encoders or image features onto the sentence generators. The coherency of the generated story is further improved by conveying (cascading) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tkim-snu/GLACNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition