OptiBox: Breaking the Limits of Proposals for Visual Grounding
Zicong Fan, Si Yi Meng, Leonid Sigal, James J. Little

TL;DR
OptiBox introduces a progressive bounding box refinement method that enhances visual grounding by leveraging global image context, achieving state-of-the-art results with less training data.
Contribution
The paper presents OptiBox, a novel bounding box refinement architecture that improves visual grounding performance, especially with limited training data, by integrating global image encoding.
Findings
State-of-the-art performance on Flickr30k Entities with GroundeR + OptiBox.
Surpasses many fully supervised models using only 50% of training data.
Achieves competitive results with as low as 3% of training data.
Abstract
The problem of language grounding has attracted much attention in recent years due to its pivotal role in more general image-lingual high level reasoning tasks (e.g., image captioning, VQA). Despite the tremendous progress in visual grounding, the performance of most approaches has been hindered by the quality of bounding box proposals obtained in the early stages of all recent pipelines. To address this limitation, we propose a general progressive query-guided bounding box refinement architecture (OptiBox) that leverages global image encoding for added context. We apply this architecture in the context of the GroundeR model, first introduced in 2016, which has a number of unique and appealing properties, such as the ability to learn in the semi-supervised setting by leveraging cyclic language-reconstruction. Using GroundeR + OptiBox and a simple semantic language reconstruction loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
