M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction
Jiang Liu, Bobo Li, Xinran Yang, Na Yang, Hao Fei, Mingyao Zhang, Fei, Li, Donghong Ji

TL;DR
This paper introduces M$^{3}$D, a comprehensive multilingual multimodal dataset for document-level information extraction, including novel tasks, video data, and a benchmark model to advance multimodal IE research.
Contribution
The paper presents M$^{3}$D, a new multilingual, multimodal, and multitask dataset with video and fine-grained grounding, plus a hierarchical model with modules for missing modalities.
Findings
Achieved over 53% accuracy on four IE tasks in both English and Chinese.
Demonstrated the effectiveness of the DFFM and MMCM modules in multimodal integration.
Set a new benchmark for future multimodal IE research.
Abstract
Multimodal information extraction (IE) tasks have attracted increasing attention because many studies have shown that multimodal information benefits text information extraction. However, existing multimodal IE datasets mainly focus on sentence-level image-facilitated IE in English text, and pay little attention to video-based multimodal IE and fine-grained visual grounding. Therefore, in order to promote the development of multimodal IE, we constructed a multimodal multilingual multitask dataset, named MD, which has the following features: (1) It contains paired document-level text and video to enrich multimodal information; (2) It supports two widely-used languages, namely English and Chinese; (3) It includes more multimodal IE tasks such as entity recognition, entity chain extraction, relation extraction and visual grounding. In addition, our dataset introduces an unexplored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Focus
