I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Yusheng Huang, Zhouhan Lin

TL;DR
The paper introduces I2SRM, a novel method for multimodal information extraction that models intra- and inter-sample relationships, improving performance on multiple datasets through effective representation learning and data augmentation.
Contribution
I2SRM is the first to jointly model intra- and inter-sample relationships with an AttnMixup strategy for multimodal extraction tasks.
Findings
Achieves 77.12% F1 on Twitter-2015
Achieves 88.40% F1 on Twitter-2017
Achieves 84.12% F1 on MNRE
Abstract
Multimodal information extraction is attracting research attention nowadays, which requires aggregating representations from different modalities. In this paper, we present the Intra- and Inter-Sample Relationship Modeling (I2SRM) method for this task, which contains two modules. Firstly, the intra-sample relationship modeling module operates on a single sample and aims to learn effective representations. Embeddings from textual and visual modalities are shifted to bridge the modality gap caused by distinct pre-trained language and image models. Secondly, the inter-sample relationship modeling module considers relationships among multiple samples and focuses on capturing the interactions. An AttnMixup strategy is proposed, which not only enables collaboration among samples but also augments data to improve generalization. We conduct extensive experiments on the multimodal named entity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
