When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning
Hao Yan, Chaozhuo Li, Jun Yin, Zhigang Yu, Weihao Han, Mingzheng Li,, Zhengxin Zeng, Hao Sun, Senzhang Wang

TL;DR
This paper introduces MAGB, a comprehensive benchmark dataset for Multimodal Attributed Graphs, and systematically evaluates two paradigms, revealing insights into modality importance, embedding effectiveness, and biases, to advance MAG representation learning.
Contribution
It provides the first standardized MAG benchmark dataset and evaluation framework, enabling systematic comparison of MAGRL methods and insights into modality and model interactions.
Findings
Modality importance varies across domains.
Multimodal embeddings enhance GNN performance.
VLMs effectively generate embeddings, reducing modality imbalance.
Abstract
Multimodal Attributed Graphs (MAGs) are ubiquitous in real-world applications, encompassing extensive knowledge through multimodal attributes attached to nodes (e.g., texts and images) and topological structure representing node interactions. Despite its potential to advance diverse research fields like social networks and e-commerce, MAG representation learning (MAGRL) remains underexplored due to the lack of standardized datasets and evaluation frameworks. In this paper, we first propose MAGB, a comprehensive MAG benchmark dataset, featuring curated graphs from various domains with both textual and visual attributes. Based on MAGB dataset, we further systematically evaluate two mainstream MAGRL paradigms: , which integrates multimodal attributes via Graph Neural Networks (GNNs), and , which harnesses Vision Language Models (VLMs)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Graph Neural Networks · Natural Language Processing Techniques
