Removing Word-Level Spurious Alignment between Images and   Pseudo-Captions in Unsupervised Image Captioning

Ukyo Honda; Yoshitaka Ushiku; Atsushi Hashimoto; Taro Watanabe; Yuji; Matsumoto

arXiv:2104.13872·cs.CL·June 2, 2021·1 cites

Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji, Matsumoto

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gating mechanism to improve unsupervised image captioning by focusing on aligning images with only the most relevant words in pseudo-captions, enhancing caption quality without complex objectives.

Contribution

It proposes a simple, effective gating mechanism for word-level alignment that improves unsupervised image captioning performance by filtering out irrelevant words.

Findings

01

Outperforms previous methods without complex objectives

02

Further improves performance when combined with sentence-level alignment

03

Highlights importance of word-level alignment in caption quality

Abstract

Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images. In previous work, pseudo-captions, i.e., sentences that contain the detected object labels, were assigned to a given image. The focus of the previous work was on the alignment of input images and pseudo-captions at the sentence level. However, pseudo-captions contain many words that are irrelevant to a given image. In this work, we investigate the effect of removing mismatched words from image-sentence alignment to determine how they make this task difficult. We propose a simple gating mechanism that is trained to align image features with only the most reliable words in pseudo-captions: the detected object labels. The experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ukyh/RemovingSpuriousAlignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning