Deep Learning Applied to Image and Text Matching

Afroze Ibrahim Baqapuri

arXiv:1601.03478·cs.LG·January 15, 2016·1 cites

Deep Learning Applied to Image and Text Matching

Afroze Ibrahim Baqapuri

PDF

Open Access

TL;DR

This paper presents a deep learning system for bidirectional image and sentence retrieval using CNNs and MLPs, embedding multimodal data into a shared space for similarity comparison, with various textual models tested.

Contribution

The study introduces a simpler yet effective multimodal embedding approach for image-text matching, exploring different textual models and training strategies for improved retrieval performance.

Findings

01

Comparable performance to recent methods despite simplicity

02

Training data negative sampling significantly affects results

03

Different textual models impact retrieval accuracy

Abstract

The ability to describe images with natural language sentences is the hallmark for image and language understanding. Such a system has wide ranging applications such as annotating images and using natural sentences to search for images.In this project we focus on the task of bidirectional image retrieval: such asystem is capable of retrieving an image based on a sentence (image search) andretrieve sentence based on an image query (image annotation). We present asystem based on a global ranking objective function which uses a combinationof convolutional neural networks (CNN) and multi layer perceptrons (MLP).It takes a pair of image and sentence and processes them in different channels,finally embedding it into a common multimodal vector space. These embeddingsencode abstract semantic information about the two inputs and can be comparedusing traditional information retrieval approaches.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning