Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation
Ji Dai, Quan Fang, Jun Hu, Desheng Cai, Yang Yang, Can Zhao

TL;DR
This paper introduces CRANE, a novel multimodal recommendation model that effectively captures complex intra- and inter-modal relationships through recursive attention and dual graph learning, leading to improved accuracy and efficiency.
Contribution
CRANE innovatively combines recursive cross-modal attention with dual graph embedding to enhance modality fusion and symmetric feature learning in multimodal recommendation systems.
Findings
Achieves an average 5% improvement over baselines on key metrics.
Faster convergence on small datasets and superior performance on large datasets.
Maintains high computational efficiency despite complex modeling.
Abstract
Multimedia recommendation systems leverage user-item interactions and multimodal information to capture user preferences, enabling more accurate and personalized recommendations. Despite notable advancements, existing approaches still face two critical limitations: first, shallow modality fusion often relies on simple concatenation, failing to exploit rich synergic intra- and inter-modal relationships; second, asymmetric feature treatment-where users are only characterized by interaction IDs while items benefit from rich multimodal content-hinders the learning of a shared semantic space. To address these issues, we propose a Cross-modal Recursive Attention Network with dual graph Embedding (CRANE). To tackle shallow fusion, we design a core Recursive Cross-Modal Attention (RCA) mechanism that iteratively refines modality features based on cross-correlations in a joint latent space,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
