TL;DR
DualFashion is a novel dual-diffusion transformer architecture that jointly models image and text modalities for personalized, interpretable fashion recommendation, improving behavior modeling and generation diversity.
Contribution
It introduces a dual-diffusion transformer with image and text branches, enabling joint modeling and interpretability in fashion recommendation systems.
Findings
Outperforms state-of-the-art methods on iFashion and Polyvore-U datasets.
Produces both fashion item images and textual descriptions for better interpretability.
Enhances generation diversity with a text-augmented fine-tuning strategy.
Abstract
Personalized generative recommender systems have emerged as a promising solution for fashion recommendation. However, existing methods primarily rely on implicit visual embeddings from historical interactions, which often contain preference-irrelevant information and result in insufficient user behavior modeling. Moreover, these models typically generate only item images, providing limited interpretability. To address these limitations, we propose DualFashion, a Dual-Diffusional Generative Fashion Recommendation Architecture that jointly models image and text modalities for personalized and explainable recommendation. DualFashion adopts a dual-diffusion Transformer with image and text branches, where structured attribute-level captions and visual outfit information are jointly used as conditioning signals to model user behavior. The proposed architecture produces both fashion item…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
