Human-like Controllable Image Captioning with Verb-specific Semantic   Roles

Long Chen; Zhihong Jiang; Jun Xiao; Wei Liu

arXiv:2103.12204·cs.CV·March 24, 2021·5 cites

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

Long Chen, Zhihong Jiang, Jun Xiao, Wei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces Verb-specific Semantic Roles as a new control signal for Controllable Image Captioning, enabling more human-like, event-compatible, and sample-suitable caption generation with improved controllability and diversity.

Contribution

It proposes VSR as a novel control signal for CIC, along with a grounded semantic role labeling model, a semantic structure planner, and a role-shift captioning model, enhancing controllability and diversity.

Findings

01

Outperforms strong baselines on CIC benchmarks

02

Achieves better controllability and diversity in generated captions

03

Enables multi-level diverse caption generation

Abstract

Controllable Image Captioning (CIC) -- generating image descriptions following designated control signals -- has received unprecedented attention over the last few years. To emulate the human ability in controlling caption generation, current CIC studies focus exclusively on control signals concerning objective properties, such as contents of interest or descriptive patterns. However, we argue that almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal: 1) Event-compatible: all visual contents referred to in a single sentence should be compatible with the described activity. 2) Sample-suitable: the control signals should be suitable for a specific image sample. To this end, we propose a new control signal for CIC: Verb-specific Semantic Roles (VSR). VSR consists of a verb and some semantic roles, which represents a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mad-red/VSR-guided-CIC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition