MMGA: Multimodal Learning with Graph Alignment
Xuan Yang, Quanjin Tao, Xiao Feng, Donghong Cai, Xiang Ren, Yang Yang

TL;DR
This paper introduces MMGA, a novel framework for multimodal pre-training that effectively integrates graph, image, and text data from social media to improve user representation learning.
Contribution
MMGA proposes a multi-step graph alignment mechanism that enables mutual enhancement of graph, image, and text modalities during pre-training.
Findings
Improves user prediction performance on social media data.
Introduces the first social media multimodal dataset with graph data.
Demonstrates effective integration of graph, image, and text modalities.
Abstract
Multimodal pre-training breaks down the modality barriers and allows the individual modalities to be mutually augmented with information, resulting in significant advances in representation learning. However, graph modality, as a very general and important form of data, cannot be easily interacted with other modalities because of its non-regular nature. In this paper, we propose MMGA (Multimodal learning with Graph Alignment), a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media to enhance user representation learning. In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders, while using the information from the image and text modalities to guide the graph encoder learning. We conduct experiments on the dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Text and Document Classification Technologies
