Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang, Yin Li, Jing Huang, Svetlana Lazebnik

TL;DR
This paper introduces two neural network architectures for image-text matching, demonstrating improved accuracy in phrase localization and image-sentence retrieval tasks through novel training strategies and network designs.
Contribution
It proposes two distinct two-branch neural network structures with innovative training methods for enhanced image-text matching performance.
Findings
High accuracy in phrase localization on Flickr30K Entities
Effective bi-directional image-sentence retrieval on Flickr30K and MSCOCO
Improved neighborhood sampling enhances training effectiveness
Abstract
Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
