# Aiding Intra-Text Representations with Visual Context for Multimodal   Named Entity Recognition

**Authors:** Omer Arshad, Ignazio Gallo, Shah Nawaz, Alessandro Calefati

arXiv: 1904.01356 · 2019-04-03

## TL;DR

This paper introduces an end-to-end multimodal model that combines text and visual context to improve Named Entity Recognition on social media posts, achieving state-of-the-art results.

## Contribution

It presents a novel joint text-image representation model that extends multi-dimensional self-attention to incorporate visual context for NER.

## Key findings

- Achieves state-of-the-art accuracy on Twitter NER dataset.
- Effectively captures textual and visual contexts.
- Enhances relationship modeling between words with images.

## Abstract

With massive explosion of social media such as Twitter and Instagram, people daily share billions of multimedia posts, containing images and text. Typically, text in these posts is short, informal and noisy, leading to ambiguities which can be resolved using images. In this paper we explore text-centric Named Entity Recognition task on these multimedia posts. We propose an end to end model which learns a joint representation of a text and an image. Our model extends multi-dimensional self attention technique, where now image helps to enhance relationship between words. Experiments show that our model is capable of capturing both textual and visual contexts with greater accuracy, achieving state-of-the-art results on Twitter multimodal Named Entity Recognition dataset.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.01356/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1904.01356/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1904.01356/full.md

---
Source: https://tomesphere.com/paper/1904.01356