TL;DR
This paper introduces a new dataset and model for multimodal entity linking on Twitter, combining text and images to improve entity disambiguation in social media content.
Contribution
It presents a fully annotated Twitter dataset for multimodal entity linking and a joint learning model that leverages both textual and visual information.
Findings
The model outperforms text-only approaches on the dataset.
Visual information significantly improves entity linking accuracy.
The dataset enables future research in multimodal entity disambiguation.
Abstract
In many information extraction applications, entity linking (EL) has emerged as a crucial task that allows leveraging information about named entities from a knowledge base. In this paper, we address the task of multimodal entity linking (MEL), an emerging research field in which textual and visual information is used to map an ambiguous mention to an entity in a knowledge base (KB). First, we propose a method for building a fully annotated Twitter dataset for MEL, where entities are defined in a Twitter KB. Then, we propose a model for jointly learning a representation of both mentions and entities from their textual and visual contexts. We demonstrate the effectiveness of the proposed model by evaluating it on the proposed dataset and highlight the importance of leveraging visual information when it is available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
