Learning Robust Visual-Semantic Embeddings

Yao-Hung Hubert Tsai; Liang-Kang Huang; Ruslan Salakhutdinov

arXiv:1703.05908·cs.CV·March 21, 2017·21 cites

Learning Robust Visual-Semantic Embeddings

Yao-Hung Hubert Tsai, Liang-Kang Huang, Ruslan Salakhutdinov

PDF

Open Access

TL;DR

This paper introduces an end-to-end unsupervised learning framework for robust visual-semantic embeddings, leveraging auto-encoders and domain adaptation to improve image and text representation across labeled and unlabeled data.

Contribution

It presents a novel combination of auto-encoders and Maximum Mean Discrepancy loss for joint embedding learning, including an unsupervised-data adaptation inference technique.

Findings

01

Outperforms state-of-the-art on multiple datasets

02

Effective in zero and few-shot recognition tasks

03

Improves robustness of multi-modal representations

Abstract

Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural networks, we propose an end-to-end learning framework that is able to extract more robust multi-modal representations across domains. The proposed method combines representation learning models (i.e., auto-encoders) together with cross-domain learning criteria (i.e., Maximum Mean Discrepancy loss) to learn joint embeddings for semantic and visual features. A novel technique of unsupervised-data adaptation inference is introduced to construct more comprehensive embeddings for both labeled and unlabeled data. We evaluate our method on Animals with Attributes and Caltech-UCSD Birds 200-2011 dataset with a wide range of applications, including zero and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques