Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations
Luyi Ma, Xiaohan Li, Zezhong Fan, Kai Zhao, Jianpeng Xu, Jason Cho,, Praveen Kanumala, Kaushiki Nag, Sushant Kumar, Kannan Achan

TL;DR
This paper presents a novel triple-modality fusion framework that combines visual, textual, and graph data with large language models to improve multi-behavior recommendation accuracy.
Contribution
The paper introduces TMF, a new model that aligns and integrates three data modalities with LLMs for enhanced personalized recommendations.
Findings
Improved recommendation accuracy demonstrated in experiments
Effective modality fusion via cross-attention and self-attention mechanisms
Ablation studies confirm the benefits of the TMF framework
Abstract
Integrating diverse data modalities is crucial for enhancing the performance of personalized recommendation systems. Traditional models, which often rely on singular data sources, lack the depth needed to accurately capture the multifaceted nature of item features and user behaviors. This paper introduces a novel framework for multi-behavior recommendations, leveraging the fusion of triple-modality, which is visual, textual, and graph data through alignment with large language models (LLMs). By incorporating visual information, we capture contextual and aesthetic item characteristics; textual data provides insights into user interests and item features in detail; and graph data elucidates relationships within the item-behavior heterogeneous graphs. Our proposed model called Triple Modality Fusion (TMF) utilizes the power of LLMs to align and integrate these three modalities, achieving a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Computational and Text Analysis Methods
MethodsALIGN
