Group-based Distinctive Image Captioning with Memory Attention

Jiuniu Wang; Wenjia Xu; Qingzhong Wang; Antoni B. Chan

arXiv:2108.09151·cs.CV·April 11, 2022·1 cites

Group-based Distinctive Image Captioning with Memory Attention

Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan

PDF

Open Access

TL;DR

This paper introduces GdisCap, a novel group-based memory attention model that enhances image captioning by emphasizing unique object features within image groups, leading to more distinctive and accurate captions.

Contribution

The paper proposes a group-based memory attention module and a new evaluation metric, DisWordRate, to improve and measure caption distinctiveness in image captioning models.

Findings

01

Significant improvement in caption distinctiveness and accuracy.

02

State-of-the-art performance on benchmark datasets.

03

User study confirms the effectiveness of the new metric.

Abstract

Describing images using natural language is widely known as image captioning, which has made consistent progress due to the development of computer vision and natural language generation techniques. Though conventional captioning models achieve high accuracy based on popular metrics, i.e., BLEU, CIDEr, and SPICE, the ability of captions to distinguish the target image from other similar images is under-explored. To generate distinctive captions, a few pioneers employ contrastive learning or re-weighted the ground-truth captions, which focuses on one single input image. However, the relationships between objects in a similar image group (e.g., items or properties within the same album or fine-grained events) are neglected. In this paper, we improve the distinctiveness of image captions using a Group-based Distinctive Captioning Model (GdisCap), which compares each image with other images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning