Distribution-Guided Auto-Encoder for User Multimodal Interest Cross Fusion
Moyu Zhang, Yongxiang Tang, Yujun Jin, Jinxin Hu, Yu Zhang

TL;DR
This paper introduces DMAE, a novel auto-encoder model that dynamically fuses multimodal user interest information at the behavioral level, addressing data sparsity and improving recommendation accuracy.
Contribution
It proposes a distribution-guided auto-encoder for cross-fusing multimodal user interests based on behavioral sequences, overcoming limitations of early fusion methods.
Findings
DMAE outperforms existing multimodal recommendation models.
The model effectively captures dynamic user interests.
Extensive experiments validate its superiority.
Abstract
Traditional recommendation methods rely on correlating the embedding vectors of item IDs to capture implicit collaborative filtering signals to model the user's interest in the target item. Consequently, traditional ID-based methods often encounter data sparsity problems stemming from the sparse nature of ID features. To alleviate the problem of item ID sparsity, recommendation models incorporate multimodal item information to enhance recommendation accuracy. However, existing multimodal recommendation methods typically employ early fusion approaches, which focus primarily on combining text and image features, while neglecting the contextual influence of user behavior sequences. This oversight prevents dynamic adaptation of multimodal interest representations based on behavioral patterns, consequently restricting the model's capacity to effectively capture user multimodal interests.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
