Unsupervised Visual Sense Disambiguation for Verbs using Multimodal   Embeddings

Spandana Gella; Mirella Lapata; Frank Keller

arXiv:1603.09188·cs.CL·March 31, 2016

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

Spandana Gella, Mirella Lapata, Frank Keller

PDF

1 Repo

TL;DR

This paper introduces a new task of visual sense disambiguation for verbs, proposing an unsupervised Lesk-based algorithm and a new dataset, VerSe, to improve multimodal understanding of actions in images.

Contribution

The paper presents VerSe, a new dataset with sense labels for visual disambiguation, and an unsupervised Lesk-based method leveraging textual, visual, and multimodal embeddings.

Findings

01

Textual embeddings excel with annotated data.

02

Multimodal embeddings perform well on unannotated images.

03

Supervised features improve disambiguation accuracy.

Abstract

We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illustration. We introduce VerSe, a new dataset that augments existing multimodal datasets (COCO and TUHOI) with sense labels. We propose an unsupervised algorithm based on Lesk which performs visual sense disambiguation using textual, visual, or multimodal embeddings. We find that textual embeddings perform well when gold-standard textual annotations (object labels and image descriptions) are available, while multimodal embeddings perform well on unannotated images. We also verify our findings by using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spandanagella/verse
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.