MMUTF: Multimodal Multimedia Event Argument Extraction with Unified   Template Filling

Philipp Seeberger; Dominik Wagner; Korbinian Riedhammer

arXiv:2406.12420·cs.CL·October 3, 2024

MMUTF: Multimodal Multimedia Event Argument Extraction with Unified Template Filling

Philipp Seeberger, Dominik Wagner, Korbinian Riedhammer

PDF

Open Access

TL;DR

This paper introduces MMUTF, a unified template filling model that leverages textual prompts to connect modalities, significantly improving multimedia event argument extraction performance over existing methods.

Contribution

The paper proposes a novel unified template filling approach that effectively integrates textual and visual modalities for multimedia event argument extraction.

Findings

01

Surpasses SOTA on textual EAE by +7% F1 score.

02

Outperforms second-best systems in multimedia EAE.

03

Demonstrates effectiveness on the M2E2 benchmark.

Abstract

With the advancement of multimedia technologies, news documents and user-generated content are often represented as multiple modalities, making Multimedia Event Extraction (MEE) an increasingly important challenge. However, recent MEE methods employ weak alignment strategies and data augmentation with simple classification models, which ignore the capabilities of natural language-formulated event templates for the challenging Event Argument Extraction (EAE) task. In this work, we focus on EAE and address this issue by introducing a unified template filling model that connects the textual and visual modalities via textual prompts. This approach enables the exploitation of cross-ontology transfer and the incorporation of event-specific semantics. Experiments on the M2E2 benchmark demonstrate the effectiveness of our approach. Our system surpasses the current SOTA on textual EAE by +7% F1,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsFocus