Towards Addressing the Misalignment of Object Proposal Evaluation for   Vision-Language Tasks via Semantic Grounding

Joshua Feinglass; Yezhou Yang

arXiv:2309.00215·cs.CV·September 4, 2023

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

Joshua Feinglass, Yezhou Yang

PDF

Open Access 1 Repo

TL;DR

This paper investigates the misalignment in object proposal evaluation for vision-language tasks and proposes a semantic grounding method that improves the correlation between proposal quality and downstream task performance.

Contribution

It introduces a new evaluation protocol based on semantic importance scores to better align object proposal assessments with vision-language task performance.

Findings

01

Semantic grounding improves evaluation alignment with downstream tasks.

02

Proposed method correlates better with captioning and human annotations.

03

Traditional evaluation techniques are often misaligned with actual task performance.

Abstract

Object proposal generation serves as a standard pre-processing step in Vision-Language (VL) tasks (image captioning, visual question answering, etc.). The performance of object proposals generated for VL tasks is currently evaluated across all available annotations, a protocol that we show is misaligned - higher scores do not necessarily correspond to improved performance on downstream VL tasks. Our work serves as a study of this phenomenon and explores the effectiveness of semantic grounding to mitigate its effects. To this end, we propose evaluating object proposals against only a subset of available annotations, selected by thresholding an annotation importance score. Importance of object annotations to VL tasks is quantified by extracting relevant semantic information from text describing the image. We show that our method is consistent and demonstrates greatly improved alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joshuafeinglass/vl-detector-eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques