Order-Embeddings of Images and Language

Ivan Vendrov; Ryan Kiros; Sanja Fidler; Raquel Urtasun

arXiv:1511.06361·cs.LG·March 2, 2016·ICLR·87 cites

Order-Embeddings of Images and Language

Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun

PDF

Open Access 2 Repos

TL;DR

This paper introduces a method for learning ordered representations that model the hierarchical relationships among images and language, improving performance in tasks like hypernym prediction and image-caption retrieval.

Contribution

It proposes a general approach for explicitly modeling the partial order structure in visual-semantic hierarchies, applicable across multiple tasks.

Findings

01

Improved hypernym prediction accuracy

02

Enhanced image-caption retrieval performance

03

Effective modeling of visual-semantic hierarchies

Abstract

Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving images and language. We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling