Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction
Meishan Zhang, Hao Fei, Bin Wang, Shengqiong Wu, Yixin Cao, Fei Li,, Min Zhang

TL;DR
This paper introduces a unified framework for multimodal information extraction across various modalities, proposes a large multimodal language model Reamo for comprehensive recognition and grounding, and provides a new benchmark dataset for evaluation.
Contribution
It pioneers the concept of grounded Multimodal Universal Information Extraction and develops Reamo, a model capable of recognizing and grounding information from all modalities simultaneously.
Findings
Reamo outperforms existing models across multiple evaluation metrics.
A new diverse benchmark dataset for grounded MUIE tasks is curated.
Reamo demonstrates strong capabilities in fine-grained multimodal grounding.
Abstract
In the field of information extraction (IE), tasks across a wide range of modalities and their combinations have been traditionally studied in isolation, leaving a gap in deeply recognizing and analyzing cross-modal information. To address this, this work for the first time introduces the concept of grounded Multimodal Universal Information Extraction (MUIE), providing a unified task framework to analyze any IE tasks over various modalities, along with their fine-grained groundings. To tackle MUIE, we tailor a multimodal large language model (MLLM), Reamo, capable of extracting and grounding information from all modalities, i.e., recognizing everything from all modalities at once. Reamo is updated via varied tuning strategies, equipping it with powerful capabilities for information recognition and fine-grained multimodal grounding. To address the absence of a suitable benchmark for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
