A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition
Hussain Ahmad, Qingyang Zeng, and Jing Wan

TL;DR
This paper introduces a new Urdu multimodal dataset and a framework for named entity recognition, addressing the scarcity of resources and establishing benchmarks for future research in low-resource language MNER.
Contribution
It presents the first annotated Urdu MNER dataset and a multimodal framework combining text and images with baseline evaluations.
Findings
The dataset enables benchmarking for Urdu MNER.
The proposed model achieves state-of-the-art results on the dataset.
Baseline models provide a foundation for future research.
Abstract
The emergence of multimodal content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasingly important area of research within Natural Language Processing. Despite progress in high-resource languages such as English, MNER remains underexplored for low-resource languages like Urdu. The primary challenges include the scarcity of annotated multimodal datasets and the lack of standardized baselines. To address these challenges, we introduce the U-MNER framework and release the Twitter2015-Urdu dataset, a pioneering resource for Urdu MNER. Adapted from the widely used Twitter2015 dataset, it is annotated with Urdu-specific grammar rules. We establish benchmark baselines by evaluating both text-based and multimodal models on this dataset, providing comparative analyses to support future research on Urdu MNER. The U-MNER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining
MethodsAverage Pooling · Global Average Pooling · Convolution · Kaiming Initialization · Max Pooling · ALIGN
