Show, Adapt and Tell: Adversarial Training of Cross-domain Image   Captioner

Tseng-Hung Chen; Yuan-Hong Liao; Ching-Yao Chuang; Wan-Ting Hsu,; Jianlong Fu; Min Sun

arXiv:1705.00930·cs.CV·August 15, 2017·30 cites

Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu,, Jianlong Fu, Min Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces an adversarial training framework with dual critics to improve cross-domain image captioning, enabling effective transfer without paired data and achieving significant performance gains across multiple datasets.

Contribution

We propose a novel adversarial training approach with domain and multi-modal critics for cross-domain image captioning without paired target data.

Findings

01

Achieves 21.8% CIDEr-D improvement on CUB-200-2011.

02

Consistently outperforms baselines across four target datasets.

03

Critic-based inference boosts caption quality by 4.5%.

Abstract

Impressive image captioning results are achieved in domains with plenty of training image and sentence pairs (e.g., MSCOCO). However, transferring to a target domain with significant domain shifts but no paired training data (referred to as cross-domain image captioning) remains largely unexplored. We propose a novel adversarial training procedure to leverage unpaired data in the target domain. Two critic networks are introduced to guide the captioner, namely domain critic and multi-modal critic. The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain. The multi-modal critic assesses whether an image and its generated sentence are a valid pair. During training, the critics and captioner act as adversaries -- captioner aims to generate indistinguishable sentences, whereas critics aim at distinguishing them. The assessment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tsenghungchen/show-adapt-and-tell
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition