Detector-Free Weakly Supervised Grounding by Separation

Assaf Arbelle; Sivan Doveh; Amit Alfassy; Joseph Shtok; Guy Lev; Eli; Schwartz; Hilde Kuehne; Hila Barak Levi; Prasanna Sattigeri; Rameswar Panda,; Chun-Fu Chen; Alex Bronstein; Kate Saenko; Shimon Ullman; Raja Giryes,; Rogerio Feris; Leonid Karlinsky

arXiv:2104.09829·cs.CV·April 21, 2021

Detector-Free Weakly Supervised Grounding by Separation

Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli, Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda,, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes,, Rogerio Feris, Leonid Karlinsky

PDF

1 Repo

TL;DR

This paper introduces a detector-free weakly supervised grounding method that learns to localize text phrases in images without pre-trained detectors, achieving significant accuracy improvements on multiple benchmarks.

Contribution

The proposed Grounding by Separation method enables detector-free phrase grounding by synthesizing region associations through image blending and segmentation, outperforming previous methods.

Findings

01

Up to 8.5% accuracy improvement over previous detector-free methods.

02

Over 7% improvement compared to detector-based approaches.

03

Effective on multiple benchmarks including Flickr30K, Visual Genome, and ReferIt.

Abstract

Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions' associations by random alpha-blending of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aarbelle/GroundingBySeparation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.