From Coarse to Fine-grained Concept based Discrimination for Phrase   Detection

Maan Qraitem; Bryan A. Plummer

arXiv:2112.03237·cs.CV·November 16, 2022

From Coarse to Fine-grained Concept based Discrimination for Phrase Detection

Maan Qraitem, Bryan A. Plummer

PDF

Open Access

TL;DR

This paper introduces CFCD-Net, a phrase detection model that improves discrimination by using concept groups and a fine-grained module, leading to better accuracy on benchmark datasets.

Contribution

The paper proposes novel methods for sampling negatives and handling fine-grained mutually-exclusive phrases in phrase detection.

Findings

01

Achieves 1.5-2 point improvement in mAP over state-of-the-art.

02

Improves 3-4 points on phrases affected by the fine-grained module.

03

Demonstrates effectiveness on Flickr30K Entities and RefCOCO+ datasets.

Abstract

Phrase detection requires methods to identify if a phrase is relevant to an image and localize it, if applicable. A key challenge for training more discriminative detection models is sampling negatives. Sampling techniques from prior work focus primarily on hard, often noisy, negatives disregarding the broader distribution of negative samples. Our proposed CFCD-Net addresses this through two novels methods. First, we generate groups of semantically similar words we call concepts (\eg, \{dog, cat, horse\} and \ \{car, truck, SUV\}), and then train our CFCD-Net to discriminate between a region of interest and its unrelated concepts. Second, for phrases containing fine-grained mutually-exclusive words (\eg, colors), we force the model to select only one applicable phrase for each region using our novel fine-grained module (FGM). We evaluate our approach on Flickr30K Entities and RefCOCO+,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Text and Document Classification Technologies