Discrete Preference Learning for Personalized Multimodal Generation
Yuting Zhang, Ying Sun, Dazhong Shen, Ziwei Xie, Feng Liu, Changwang Zhang, Xiang Liu, Jun Wang, Hui Xiong

TL;DR
This paper introduces DPPMG, a two-stage framework for personalized multimodal content generation that models discrete modal-specific preferences and ensures cross-modal consistency.
Contribution
It proposes a novel approach combining a graph neural network for preference modeling with discrete tokens and a reward-based fine-tuning for consistency.
Findings
Effective in generating personalized multimodal content
Improves cross-modal consistency in generated outputs
Outperforms existing models on real-world datasets
Abstract
The emergence of generative models enables the creation of texts and images tailored to users' preferences. Existing personalized generative models have two critical limitations: lacking a dedicated paradigm for accurate preference modeling, and generating unimodal content despite real-world multimodal-driven user interactions. Therefore, we propose personalized multimodal generation, which captures modal-specific preferences via a dedicated preference model from multimodal interactions, and then feeds them into downstream generators for personalized multimodal content. However, this task presents two challenges: (1) Gap between continuous preferences from dedicated modeling and discrete token inputs intrinsic to generator architectures; (2) Potential inconsistency between generated images and texts. To tackle these, we present a two-stage framework called Discrete Preference learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
