CMNER: A Chinese Multimodal NER Dataset based on Social Media

Yuanze Ji; Bobo Li; Jun Zhou; Fei Li; Chong Teng; Donghong Ji

arXiv:2402.13693·cs.CL·March 4, 2024·1 cites

CMNER: A Chinese Multimodal NER Dataset based on Social Media

Yuanze Ji, Bobo Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces CMNER, a new Chinese multimodal NER dataset from social media, demonstrating that integrating images improves entity recognition and cross-lingual data enhances model performance.

Contribution

The paper creates the first large-scale Chinese multimodal NER dataset from Weibo and explores the benefits of multimodal and cross-lingual training for NER.

Findings

01

Images improve NER accuracy in Chinese social media data.

02

Cross-lingual training enhances NER performance across languages.

03

Baseline experiments validate the effectiveness of multimodal information.

Abstract

Multimodal Named Entity Recognition (MNER) is a pivotal task designed to extract named entities from text with the support of pertinent images. Nonetheless, a notable paucity of data for Chinese MNER has considerably impeded the progress of this natural language processing task within the Chinese domain. Consequently, in this study, we compile a Chinese Multimodal NER dataset (CMNER) utilizing data sourced from Weibo, China's largest social media platform. Our dataset encompasses 5,000 Weibo posts paired with 18,326 corresponding images. The entities are classified into four distinct categories: person, location, organization, and miscellaneous. We perform baseline experiments on CMNER, and the outcomes underscore the effectiveness of incorporating images for NER. Furthermore, we conduct cross-lingual experiments on the publicly available English MNER dataset (Twitter2015), and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jyz99/cmner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling