Learning Semantic Concepts and Order for Image and Sentence Matching
Yan Huang, Qi Wu, Liang Wang

TL;DR
This paper introduces a semantic-enhanced model for image and sentence matching that learns high-level semantic concepts and their correct order to improve matching accuracy, achieving state-of-the-art results.
Contribution
It proposes a novel approach combining semantic concept prediction and order learning to enhance image and sentence matching performance.
Findings
Achieves state-of-the-art results on benchmark datasets.
Improves image representation with semantic concepts and order.
Enhances matching accuracy through semantic organization.
Abstract
Image and sentence matching has made great progress recently, but it remains challenging due to the large visual-semantic discrepancy. This mainly arises from that the representation of pixel-level image usually lacks of high-level semantic information as in its matched sentence. In this work, we propose a semantic-enhanced image and sentence matching model, which can improve the image representation by learning semantic concepts and then organizing them in a correct semantic order. Given an image, we first use a multi-regional multi-label CNN to predict its semantic concepts, including objects, properties, actions, etc. Then, considering that different orders of semantic concepts lead to diverse semantic meanings, we use a context-gated sentence generation scheme for semantic order learning. It simultaneously uses the image global context containing concept relations as reference and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSigmoid Activation · Tanh Activation · Average Pooling · Long Short-Term Memory · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling
