Rethinking the Reference-based Distinctive Image Captioning

Yangjun Mao; Long Chen; Zhihong Jiang; Dong Zhang; Zhimeng Zhang; Jian; Shao; Jun Xiao

arXiv:2207.11118·cs.CV·July 25, 2022

Rethinking the Reference-based Distinctive Image Captioning

Yangjun Mao, Long Chen, Zhihong Jiang, Dong Zhang, Zhimeng Zhang, Jian, Shao, Jun Xiao

PDF

1 Repo

TL;DR

This paper introduces new benchmarks, a Transformer-based model called TransDIC, and a novel evaluation metric DisCIDEr for reference-based distinctive image captioning, addressing limitations of previous datasets and models.

Contribution

It proposes stricter benchmarks with object-level similarity control, a strong Transformer-based baseline, and a new metric for more reliable evaluation of distinctive captions.

Findings

01

TransDIC outperforms existing models on new benchmarks.

02

The new benchmarks ensure models perceive unique objects in images.

03

DisCIDEr effectively evaluates both accuracy and distinctiveness.

Abstract

Distinctive Image Captioning (DIC) -- generating distinctive captions that describe the unique details of a target image -- has received considerable attention over the last few years. A recent DIC work proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i.e., reference-based DIC (Ref-DIC). It aims to make the generated captions can tell apart the target and reference images. Unfortunately, reference images used by existing Ref-DIC works are easy to distinguish: these reference images only resemble the target image at scene-level and have few common objects, such that a Ref-DIC model can trivially generate distinctive captions even without considering the reference images. To ensure Ref-DIC models really perceive the unique objects (or attributes) in target images, we first propose two new Ref-DIC benchmarks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maoyj1998/transdic
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.