M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded   Document-level Information Extraction

Jiang Liu; Bobo Li; Xinran Yang; Na Yang; Hao Fei; Mingyao Zhang; Fei; Li; Donghong Ji

arXiv:2412.04026·cs.CL·December 17, 2024

M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction

Jiang Liu, Bobo Li, Xinran Yang, Na Yang, Hao Fei, Mingyao Zhang, Fei, Li, Donghong Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces M$^{3}$D, a comprehensive multilingual multimodal dataset for document-level information extraction, including novel tasks, video data, and a benchmark model to advance multimodal IE research.

Contribution

The paper presents M$^{3}$D, a new multilingual, multimodal, and multitask dataset with video and fine-grained grounding, plus a hierarchical model with modules for missing modalities.

Findings

01

Achieved over 53% accuracy on four IE tasks in both English and Chinese.

02

Demonstrated the effectiveness of the DFFM and MMCM modules in multimodal integration.

03

Set a new benchmark for future multimodal IE research.

Abstract

Multimodal information extraction (IE) tasks have attracted increasing attention because many studies have shown that multimodal information benefits text information extraction. However, existing multimodal IE datasets mainly focus on sentence-level image-facilitated IE in English text, and pay little attention to video-based multimodal IE and fine-grained visual grounding. Therefore, in order to promote the development of multimodal IE, we constructed a multimodal multilingual multitask dataset, named M $^{3}$ D, which has the following features: (1) It contains paired document-level text and video to enrich multimodal information; (2) It supports two widely-used languages, namely English and Chinese; (3) It includes more multimodal IE tasks such as entity recognition, entity chain extraction, relation extraction and visual grounding. In addition, our dataset introduces an unexplored…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

solkx/m3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Focus