Multimodal Graph-Based Variational Mixture of Experts Network for   Zero-Shot Multimodal Information Extraction

Baohang Zhou; Ying Zhang; Yu Zhao; Xuhui Sui; Xiaojie Yuan

arXiv:2502.15290·cs.MM·February 24, 2025

Multimodal Graph-Based Variational Mixture of Experts Network for Zero-Shot Multimodal Information Extraction

Baohang Zhou, Ying Zhang, Yu Zhao, Xuhui Sui, Xiaojie Yuan

PDF

1 Repo

TL;DR

This paper introduces MG-VMoE, a novel multimodal graph-based variational mixture of experts network, for zero-shot multimodal information extraction that effectively captures fine-grained semantic correlations between text and images.

Contribution

It proposes a new MG-VMoE model that aligns multimodal representations using a graph-based variational mixture of experts and incorporates virtual adversarial training for improved zero-shot extraction.

Findings

01

Outperforms baseline models on benchmark datasets

02

Effectively captures fine-grained semantic correlations

03

Demonstrates superior zero-shot multimodal extraction performance

Abstract

Multimodal information extraction on social media is a series of fundamental tasks to construct the multimodal knowledge graph. The tasks aim to extract the structural information in free texts with the incorporate images, including: multimodal named entity typing and multimodal relation extraction. However, the growing number of multimodal data implies a growing category set and the newly emerged entity types or relations should be recognized without additional training. To address the aforementioned challenges, we focus on the zero-shot multimodal information extraction tasks which require using textual and visual modalities for recognizing unseen categories. Compared with text-based zero-shot information extraction models, the existing multimodal ones make the textual and visual modalities aligned directly and exploit various fusion strategies to improve their performances. But the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZovanZhou/MG-VMoE
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.