MMGA: Multimodal Learning with Graph Alignment

Xuan Yang; Quanjin Tao; Xiao Feng; Donghong Cai; Xiang Ren; Yang Yang

arXiv:2210.09946·cs.MM·November 1, 2022

MMGA: Multimodal Learning with Graph Alignment

Xuan Yang, Quanjin Tao, Xiao Feng, Donghong Cai, Xiang Ren, Yang Yang

PDF

Open Access

TL;DR

This paper introduces MMGA, a novel framework for multimodal pre-training that effectively integrates graph, image, and text data from social media to improve user representation learning.

Contribution

MMGA proposes a multi-step graph alignment mechanism that enables mutual enhancement of graph, image, and text modalities during pre-training.

Findings

01

Improves user prediction performance on social media data.

02

Introduces the first social media multimodal dataset with graph data.

03

Demonstrates effective integration of graph, image, and text modalities.

Abstract

Multimodal pre-training breaks down the modality barriers and allows the individual modalities to be mutually augmented with information, resulting in significant advances in representation learning. However, graph modality, as a very general and important form of data, cannot be easily interacted with other modalities because of its non-regular nature. In this paper, we propose MMGA (Multimodal learning with Graph Alignment), a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media to enhance user representation learning. In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders, while using the information from the image and text modalities to guide the graph encoder learning. We conduct experiments on the dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Text and Document Classification Technologies