Grounded Image Text Matching with Mismatched Relation Reasoning
Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He

TL;DR
This paper proposes a new visual-linguistic task called GITM-MR to evaluate models' relation understanding, introduces a benchmark, and presents RCRN, a model that improves relation reasoning, length generalization, and data efficiency.
Contribution
It introduces GITM-MR, a novel task and benchmark for relation understanding, and proposes RCRN, a relation-aware model that enhances generalization and data efficiency.
Findings
Pre-trained models lack data efficiency and length generalization.
RCRN outperforms baseline models in length generalization.
RCRN demonstrates improved relation reasoning capabilities.
Abstract
This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models. GITM-MR requires a model to first determine if an expression describes an image, then localize referred objects or ground the mismatched parts of the text. We provide a benchmark for evaluating pre-trained models on this task, with a focus on the challenging settings of limited data and out-of-distribution sentence lengths. Our evaluation demonstrates that pre-trained models lack data efficiency and length generalization ability. To address this, we propose the Relation-sensitive Correspondence Reasoning Network (RCRN), which incorporates relation-aware reasoning via bi-directional message propagation guided by language structure. RCRN can be interpreted as a modular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Grounded Image Text Matching with Mismatched Relation Reasoning· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsFocus
