Few-Shot Relation Extraction with Hybrid Visual Evidence
Jiaying Gong, Hoda Eldardiry

TL;DR
This paper introduces a multi-modal approach to few-shot relation extraction that combines textual and visual information, significantly enhancing performance over text-only methods.
Contribution
The paper presents MFS-HVE, a novel multi-modal model that fuses visual and textual features using advanced attention mechanisms for improved relation extraction.
Findings
Visual features improve relation prediction accuracy
Multi-modal fusion outperforms uni-modal baselines
Extensive experiments validate the effectiveness of visual information
Abstract
The goal of few-shot relation extraction is to predict relations between name entities in a sentence when only a few labeled instances are available for training. Existing few-shot relation extraction methods focus on uni-modal information such as text only. This reduces performance when there are no clear contexts between the name entities described in text. We propose a multi-modal few-shot relation extraction model (MFS-HVE) that leverages both textual and visual semantic information to learn a multi-modal representation jointly. The MFS-HVE includes semantic feature extractors and multi-modal fusion components. The MFS-HVE semantic feature extractors are developed to extract both textual and visual features. The visual features include global image features and local object features within the image. The MFS-HVE multi-modal fusion unit integrates information from various modalities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced X-ray and CT Imaging · Medical Imaging Techniques and Applications
MethodsFocus
