Unlocking Multi-Modal Potentials for Link Prediction on Dynamic Text-Attributed Graphs
Yuanyuan Xu, Wenjie Zhang, Ying Zhang, Xuemin Lin, Xiwei Xu

TL;DR
This paper introduces MoMent, a multi-modal model that explicitly integrates temporal, textual, and structural information in dynamic text-attributed graphs, significantly improving link prediction accuracy.
Contribution
MoMent is the first model to explicitly model and align all three modalities in DyTAGs, addressing their disjoint distributions for better node representations.
Findings
Achieves up to 17.28% accuracy improvement
Provides up to 31x speed-up over baselines
Effectively aligns heterogeneous modalities
Abstract
Dynamic Text-Attributed Graphs (DyTAGs) are a novel graph paradigm that captures evolving temporal events (edges) alongside rich textual attributes. Existing studies can be broadly categorized into TGNN-driven and LLM-driven approaches, both of which encode textual attributes and temporal structures for DyTAG representation. We observe that DyTAGs inherently comprise three distinct modalities: temporal, textual, and structural, often exhibiting completely disjoint distributions. However, the first two modalities are largely overlooked by existing studies, leading to suboptimal performance. To address this, we propose MoMent, a multi-modal model that explicitly models, integrates, and aligns each modality to learn node representations for link prediction. Given the disjoint nature of the original modality distributions, we first construct modality-specific features and encode them using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Complex Network Analysis Techniques
MethodsSoftmax · Attention Is All You Need
