Distribution-Guided Auto-Encoder for User Multimodal Interest Cross Fusion

Moyu Zhang; Yongxiang Tang; Yujun Jin; Jinxin Hu; Yu Zhang

arXiv:2508.14485·cs.IR·August 22, 2025

Distribution-Guided Auto-Encoder for User Multimodal Interest Cross Fusion

Moyu Zhang, Yongxiang Tang, Yujun Jin, Jinxin Hu, Yu Zhang

PDF

TL;DR

This paper introduces DMAE, a novel auto-encoder model that dynamically fuses multimodal user interest information at the behavioral level, addressing data sparsity and improving recommendation accuracy.

Contribution

It proposes a distribution-guided auto-encoder for cross-fusing multimodal user interests based on behavioral sequences, overcoming limitations of early fusion methods.

Findings

01

DMAE outperforms existing multimodal recommendation models.

02

The model effectively captures dynamic user interests.

03

Extensive experiments validate its superiority.

Abstract

Traditional recommendation methods rely on correlating the embedding vectors of item IDs to capture implicit collaborative filtering signals to model the user's interest in the target item. Consequently, traditional ID-based methods often encounter data sparsity problems stemming from the sparse nature of ID features. To alleviate the problem of item ID sparsity, recommendation models incorporate multimodal item information to enhance recommendation accuracy. However, existing multimodal recommendation methods typically employ early fusion approaches, which focus primarily on combining text and image features, while neglecting the contextual influence of user behavior sequences. This oversight prevents dynamic adaptation of multimodal interest representations based on behavioral patterns, consequently restricting the model's capacity to effectively capture user multimodal interests.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.