Cooperative image captioning

Gilad Vered; Gal Oren; Yuval Atzmon; Gal Chechik

arXiv:1907.11565·cs.CV·July 29, 2019·1 cites

Cooperative image captioning

Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik

PDF

Open Access

TL;DR

This paper introduces PSST, a new training method for cooperative image captioning that improves the discriminative quality and naturalness of generated descriptions by addressing optimization challenges and constraining language to be human-like.

Contribution

The paper proposes PSST, a novel optimization technique for joint training of speaker and listener networks, and demonstrates how constraining descriptions to human language enhances naturalness and discriminativeness.

Findings

01

Recall@10 improved from 60% to 86% on COCO

02

Descriptions are more natural and discriminative

03

Method maintains language naturalness while improving task performance

Abstract

When describing images with natural language, the descriptions can be made more informative if tuned using downstream tasks. This is often achieved by training two networks: a "speaker network" that generates sentences given an image, and a "listener network" that uses them to perform a task. Unfortunately, training multiple networks jointly to communicate to achieve a joint task, faces two major challenges. First, the descriptions generated by a speaker network are discrete and stochastic, making optimization very hard and inefficient. Second, joint training usually causes the vocabulary used during communication to drift and diverge from natural language. We describe an approach that addresses both challenges. We first develop a new effective optimization based on partial-sampling from a multinomial distribution combined with straight-through gradient updates, which we name PSST for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques