ControlCap: Controllable Region-level Captioning

Yuzhong Zhao; Yue Liu; Zonghao Guo; Weijia Wu; Chen Gong; Fang Wan,; Qixiang Ye

arXiv:2401.17910·cs.CV·March 12, 2024·1 cites

ControlCap: Controllable Region-level Captioning

Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan,, Qixiang Ye

PDF

Open Access 1 Repo

TL;DR

ControlCap introduces control words and a discriminative module to improve region-level captioning, effectively addressing caption degeneration and enabling more diverse, controllable captions with enhanced generalization.

Contribution

It proposes a novel controllable captioning framework that partitions caption space with control words, improving diversity and generalization over prior models.

Findings

01

Significant CIDEr score improvements on Visual Genome and RefCOCOg datasets.

02

Outperforms state-of-the-art methods in caption diversity and accuracy.

03

Enables captioning beyond training data with interactive control words.

Abstract

Region-level captioning is challenged by the caption degeneration issue, which refers to that pre-trained multimodal models tend to predict the most frequent captions but miss the less frequent ones. In this study, we propose a controllable region-level captioning (ControlCap) approach, which introduces control words to a multimodal model to address the caption degeneration issue. In specific, ControlCap leverages a discriminative module to generate control words within the caption space to partition it to multiple sub-spaces. The multimodal model is constrained to generate captions within a few sub-spaces containing the control words, which increases the opportunity of hitting less frequent captions, alleviating the caption degeneration issue. Furthermore, interactive control words can be given by either a human or an expert model, which enables captioning beyond the training caption…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

callsys/controlcap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT-based Smart Home Systems