Order-Embeddings of Images and Language
Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun

TL;DR
This paper introduces a method for learning ordered representations that model the hierarchical relationships among images and language, improving performance in tasks like hypernym prediction and image-caption retrieval.
Contribution
It proposes a general approach for explicitly modeling the partial order structure in visual-semantic hierarchies, applicable across multiple tasks.
Findings
Improved hypernym prediction accuracy
Enhanced image-caption retrieval performance
Effective modeling of visual-semantic hierarchies
Abstract
Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving images and language. We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
