Detector-Free Weakly Supervised Grounding by Separation
Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli, Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda,, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes,, Rogerio Feris, Leonid Karlinsky

TL;DR
This paper introduces a detector-free weakly supervised grounding method that learns to localize text phrases in images without pre-trained detectors, achieving significant accuracy improvements on multiple benchmarks.
Contribution
The proposed Grounding by Separation method enables detector-free phrase grounding by synthesizing region associations through image blending and segmentation, outperforming previous methods.
Findings
Up to 8.5% accuracy improvement over previous detector-free methods.
Over 7% improvement compared to detector-based approaches.
Effective on multiple benchmarks including Flickr30K, Visual Genome, and ReferIt.
Abstract
Nowadays, there is an abundance of data involving images and surrounding free-form text weakly corresponding to those images. Weakly Supervised phrase-Grounding (WSG) deals with the task of using this data to learn to localize (or to ground) arbitrary text phrases in images without any additional annotations. However, most recent SotA methods for WSG assume the existence of a pre-trained object detector, relying on it to produce the ROIs for localization. In this work, we focus on the task of Detector-Free WSG (DF-WSG) to solve WSG without relying on a pre-trained detector. We directly learn everything from the images and associated free-form text pairs, thus potentially gaining an advantage on the categories unsupported by the detector. The key idea behind our proposed Grounding by Separation (GbS) method is synthesizing `text to image-regions' associations by random alpha-blending of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
