MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities
Yangning Li, Tingwei Lu, Yinghui Li, Tianyu Yu, Shulin Huang, Hai-Tao, Zheng, Rui Zhang, Jun Yuan

TL;DR
This paper introduces MESED, a large-scale multi-modal dataset for entity set expansion, and proposes MultiExpan, a multi-modal model that leverages visual and textual information to improve entity expansion accuracy.
Contribution
The paper presents the first multi-modal dataset for entity set expansion and a novel multi-modal model, MultiExpan, trained on four pre-training tasks, advancing the state of the art.
Findings
MultiExpan outperforms existing methods on MESED dataset.
Multi-modal information improves entity expansion accuracy.
The dataset and model facilitate future research in multi-modal entity understanding.
Abstract
The Entity Set Expansion (ESE) task aims to expand a handful of seed entities with new entities belonging to the same semantic class. Conventional ESE methods are based on mono-modality (i.e., literal modality), which struggle to deal with complex entities in the real world such as: (1) Negative entities with fine-grained semantic differences. (2) Synonymous entities. (3) Polysemous entities. (4) Long-tailed entities. These challenges prompt us to propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities. Intuitively, the benefits of multi-modal information for ESE are threefold: (1) Different modalities can provide complementary information. (2) Multi-modal information provides a unified signal via common visual properties for the same semantic class or entity. (3) Multi-modal information offers robust alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
