Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil, Shah, Danai Koutra

TL;DR
This paper introduces MM-GRAPH, a comprehensive benchmark for evaluating multimodal graph learning algorithms that integrate visual and textual data across diverse real-world datasets.
Contribution
The paper presents MM-GRAPH, the first extensive benchmark for multimodal graph learning, including diverse datasets and an empirical study on the impact of visual data integration.
Findings
Visual data enhances graph learning performance.
Multimodal approaches outperform unimodal baselines.
Challenges include data heterogeneity and modality fusion.
Abstract
Graph machine learning has made significant strides in recent years, yet the integration of visual information with graph structure and its potential for improving performance in downstream tasks remains an underexplored area. To address this critical gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), a pioneering benchmark that incorporates both visual and textual information into graph learning tasks. MM-GRAPH extends beyond existing text-attributed graph benchmarks, offering a more comprehensive evaluation framework for multimodal graph learning Our benchmark comprises seven diverse datasets of varying scales (ranging from thousands to millions of edges), designed to assess algorithms across different tasks in real-world scenarios. These datasets feature rich multimodal node attributes, including visual data, which enables a more holistic evaluation of various graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Semantic Web and Ontologies
MethodsSparse Evolutionary Training
