Generating Descriptions for Sequential Images with Local-Object   Attention and Global Semantic Context Modelling

Jing Su; Chenghua Lin; Mian Zhou; Qingyun Dai; Haoyu Lv

arXiv:2012.01295·cs.CL·December 3, 2020

Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

Jing Su, Chenghua Lin, Mian Zhou, Qingyun Dai, Haoyu Lv

PDF

Open Access

TL;DR

This paper introduces an end-to-end CNN-LSTM model with local-object attention and global semantic context modeling for generating coherent descriptions of sequential images, outperforming baselines on Microsoft datasets.

Contribution

It presents a novel CNN-LSTM architecture incorporating local-object attention and global context modeling for sequential image captioning.

Findings

01

Outperforms baseline models on three evaluation metrics.

02

Effective local-object attention improves description relevance.

03

Global semantic context enhances coherence in sequence descriptions.

Abstract

In this paper, we propose an end-to-end CNN-LSTM model for generating descriptions for sequential images with a local-object attention mechanism. To generate coherent descriptions, we capture global semantic context using a multi-layer perceptron, which learns the dependencies between sequential images. A paralleled LSTM network is exploited for decoding the sequence descriptions. Experimental results show that our model outperforms the baseline across three different evaluation metrics on the datasets published by Microsoft.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory