Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles
Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Baker, Rahul, Ghosh, James H. Martin, Nathaniel Blanchard, Nikhil Krishnaswamy

TL;DR
This paper introduces a multimodal approach to cross-document event coreference resolution that combines visual and textual data, augmenting datasets with images and proposing new models to improve accuracy.
Contribution
It presents a novel linear mapping method for integrating visual and textual cues without finetuning, along with ensemble techniques, advancing multimodal ECR methods.
Findings
Ensemble systems achieve 91.9 CoNLL F1 on augmented ECB+ dataset.
Augmented datasets with images improve coreference resolution performance.
The multimodal approach highlights the importance of visual information in challenging coreference cases.
Abstract
Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear mapping method without finetuning and 3) an ensembling approach based on splitting mention pairs by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsDiffusion
