AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention   and Text Attributes

Guoyun Tu; Ying Liu; Vladimir Vlassov

arXiv:2307.07370·cs.CV·July 17, 2023·1 cites

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

Guoyun Tu, Ying Liu, Vladimir Vlassov

PDF

Open Access

TL;DR

This paper introduces AIC-AB NET, a neural network that enhances image captioning by integrating spatial attention and text attributes, demonstrating superior performance on MS COCO and a new Fashion dataset.

Contribution

The novel AIC-AB NET combines spatial attention with text attributes in an encoder-decoder framework for improved image captioning.

Findings

01

Outperforms baseline adaptive attention network by 0.017 CIDEr score on MS COCO

02

Achieves 0.095 CIDEr score improvement on Fashion dataset

03

Effective in single-object image captioning scenarios

Abstract

Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition