EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation

Xiaoxiong Zhang; Xin Zhou; Zhiwei Zeng; Yongjie Wang; Dusit Niyato; Zhiqi Shen

arXiv:2508.16170·cs.IR·August 25, 2025

EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation

Xiaoxiong Zhang, Xin Zhou, Zhiwei Zeng, Yongjie Wang, Dusit Niyato, Zhiqi Shen

PDF

TL;DR

EGRA enhances multimodal recommendation by constructing a more robust behavior graph using pretrained model representations and introduces a dynamic alignment mechanism that adapts during training, leading to significant performance improvements.

Contribution

EGRA introduces a novel graph construction method using pretrained model representations and a bi-level dynamic alignment weighting mechanism for better modality-behavior alignment.

Findings

01

EGRA outperforms recent methods on five datasets.

02

The dynamic alignment mechanism improves modality-behavior alignment.

03

Graph robustness is enhanced by using pretrained model representations.

Abstract

MultiModal Recommendation (MMR) systems have emerged as a promising solution for improving recommendation quality by leveraging rich item-side modality information, prompting a surge of diverse methods. Despite these advances, existing methods still face two critical limitations. First, they use raw modality features to construct item-item links for enriching the behavior graph, while giving limited attention to balancing collaborative and modality-aware semantics or mitigating modality noise in the process. Second, they use a uniform alignment weight across all entities and also maintain a fixed alignment strength throughout training, limiting the effectiveness of modality-behavior alignment. To address these challenges, we propose EGRA. First, instead of relying on raw modality features, it alleviates sparsity by incorporating into the behavior graph an item-item graph built from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.