Multimodal Cross-Document Event Coreference Resolution Using Linear   Semantic Transfer and Mixed-Modality Ensembles

Abhijnan Nath; Huma Jamil; Shafiuddin Rehan Ahmed; George Baker; Rahul; Ghosh; James H. Martin; Nathaniel Blanchard; Nikhil Krishnaswamy

arXiv:2404.08949·cs.CL·April 16, 2024·1 cites

Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Baker, Rahul, Ghosh, James H. Martin, Nathaniel Blanchard, Nikhil Krishnaswamy

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multimodal approach to cross-document event coreference resolution that combines visual and textual data, augmenting datasets with images and proposing new models to improve accuracy.

Contribution

It presents a novel linear mapping method for integrating visual and textual cues without finetuning, along with ensemble techniques, advancing multimodal ECR methods.

Findings

01

Ensemble systems achieve 91.9 CoNLL F1 on augmented ECB+ dataset.

02

Augmented datasets with images improve coreference resolution performance.

03

The multimodal approach highlights the importance of visual information in challenging coreference cases.

Abstract

Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear mapping method without finetuning and 3) an ensembling approach based on splitting mention pairs by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csu-signal/multimodal-coreference
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsDiffusion