Cap2Det: Learning to Amplify Weak Caption Supervision for Object   Detection

Keren Ye; Mingda Zhang; Adriana Kovashka; Wei Li; Danfeng Qin; Jesse; Berent

arXiv:1907.10164·cs.CV·August 19, 2019·6 cites

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse, Berent

PDF

Open Access 1 Repo

TL;DR

Cap2Det introduces a novel approach to weakly supervised object detection by leveraging unstructured caption data through a text-only classifier, achieving state-of-the-art results on standard benchmarks.

Contribution

The paper proposes a new method that uses caption data for WSOD by training a text classifier, enabling detection without bounding box supervision.

Findings

01

Achieves state-of-the-art WSOD performance on three benchmarks.

02

Effectively utilizes noisy caption data for object detection.

03

Demonstrates generalization beyond dataset boundaries.

Abstract

Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-level annotations, even cheaper supervision is naturally available in the form of unstructured textual descriptions that users may freely provide when uploading image content. However, straightforward approaches to using such data for WSOD wastefully discard captions that do not exactly match object names. Instead, we show how to squeeze the most information out of these captions by training a text-only classifier that generalizes beyond dataset boundaries. Our discovery provides an opportunity for learning detection models from noisy but more abundant and freely-available caption data. We also validate our model on three classic object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yekeren/Cap2Det
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications