# Cross-lingual Visual Verb Sense Disambiguation

**Authors:** Spandana Gella, Desmond Elliott, Frank Keller

arXiv: 1904.05092 · 2019-04-18

## TL;DR

This paper introduces the MultiSense dataset for cross-lingual verb sense disambiguation using visual context, demonstrating improvements over text-only methods and enhancing multimodal translation accuracy.

## Contribution

It extends visual sense disambiguation to verbs, creating a new dataset and showing that visual context benefits cross-lingual verb sense disambiguation models.

## Key findings

- Visual context improves cross-lingual verb sense disambiguation.
- The best model's verb sense predictions enhance multimodal translation.
- The MultiSense dataset contains 9,504 annotated images with English, German, and Spanish verbs.

## Abstract

Recent work has shown that visual context improves cross-lingual sense disambiguation for nouns. We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9,504 images annotated with English, German, and Spanish verbs. Each image in MultiSense is annotated with an English verb and its translation in German or Spanish. We show that cross-lingual verb sense disambiguation models benefit from visual context, compared to unimodal baselines. We also show that the verb sense predicted by our best disambiguation model can improve the results of a text-only machine translation system when used for a multimodal translation task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.05092/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1904.05092/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1904.05092/full.md

---
Source: https://tomesphere.com/paper/1904.05092