Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse, Berent

TL;DR
Cap2Det introduces a novel approach to weakly supervised object detection by leveraging unstructured caption data through a text-only classifier, achieving state-of-the-art results on standard benchmarks.
Contribution
The paper proposes a new method that uses caption data for WSOD by training a text classifier, enabling detection without bounding box supervision.
Findings
Achieves state-of-the-art WSOD performance on three benchmarks.
Effectively utilizes noisy caption data for object detection.
Demonstrates generalization beyond dataset boundaries.
Abstract
Learning to localize and name object instances is a fundamental problem in vision, but state-of-the-art approaches rely on expensive bounding box supervision. While weakly supervised detection (WSOD) methods relax the need for boxes to that of image-level annotations, even cheaper supervision is naturally available in the form of unstructured textual descriptions that users may freely provide when uploading image content. However, straightforward approaches to using such data for WSOD wastefully discard captions that do not exactly match object names. Instead, we show how to squeeze the most information out of these captions by training a text-only classifier that generalizes beyond dataset boundaries. Our discovery provides an opportunity for learning detection models from noisy but more abundant and freely-available caption data. We also validate our model on three classic object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
