Descriptive Image-Text Matching with Graded Contextual Similarity
Jinhyun Jang, Jiyoung Lee, Kwanghoon Sohn

TL;DR
This paper introduces DITM, a novel image-text matching approach that models graded contextual similarity using sentence descriptiveness, enabling more flexible and accurate matching beyond binary supervision.
Contribution
The work proposes a new method leveraging sentence descriptiveness scores to improve image-text matching, addressing the limitations of binary supervision and capturing many-to-many relationships.
Findings
Outperforms state-of-the-art on MS-COCO, Flickr30K, and CxC datasets.
Enhances hierarchical reasoning capabilities of the model.
Effectively models complex image-text relationships with graded similarity.
Abstract
Image-text matching aims to build correspondences between visual and textual data by learning their pairwise similarities. Most existing approaches have adopted sparse binary supervision, indicating whether a pair of images and sentences matches or not. However, such sparse supervision covers a limited subset of image-text relationships, neglecting their inherent many-to-many correspondences; an image can be described in numerous texts at different descriptive levels. Moreover, existing approaches overlook the implicit connections from general to specific descriptions, which form the underlying rationale for the many-to-many relationships between vision and language. In this work, we propose descriptive image-text matching, called DITM, to learn the graded contextual similarity between image and text by exploring the descriptive flexibility of language. We formulate the descriptiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling
MethodsSparse Evolutionary Training
