Unimodal and Multimodal Representation Training for Relation Extraction

Ciaran Cooney; Rachel Heyburn; Liam Madigan; Mairead O'Cuinn; Chloe; Thompson; Joana Cavadas

arXiv:2211.06168·cs.CL·November 23, 2022

Unimodal and Multimodal Representation Training for Relation Extraction

Ciaran Cooney, Rachel Heyburn, Liam Madigan, Mairead O'Cuinn, Chloe, Thompson, Joana Cavadas

PDF

Open Access

TL;DR

This paper evaluates the predictive power of text, layout, and visual modalities in relation extraction tasks, highlighting the importance of shared representations and the potential of unimodal approaches.

Contribution

It systematically assesses the contribution of each modality in multimodal relation extraction, revealing the significance of text and layout, and exploring unimodal and multimodal training strategies.

Findings

01

Text is the most important predictor of entity relations.

02

Layout geometry is highly predictive and can be used unimodally.

03

Visual information can enhance performance in certain circumstances.

Abstract

Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1=0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies