Discrete Preference Learning for Personalized Multimodal Generation

Yuting Zhang; Ying Sun; Dazhong Shen; Ziwei Xie; Feng Liu; Changwang Zhang; Xiang Liu; Jun Wang; Hui Xiong

arXiv:2604.20434·cs.IR·April 23, 2026

Discrete Preference Learning for Personalized Multimodal Generation

Yuting Zhang, Ying Sun, Dazhong Shen, Ziwei Xie, Feng Liu, Changwang Zhang, Xiang Liu, Jun Wang, Hui Xiong

PDF

TL;DR

This paper introduces DPPMG, a two-stage framework for personalized multimodal content generation that models discrete modal-specific preferences and ensures cross-modal consistency.

Contribution

It proposes a novel approach combining a graph neural network for preference modeling with discrete tokens and a reward-based fine-tuning for consistency.

Findings

01

Effective in generating personalized multimodal content

02

Improves cross-modal consistency in generated outputs

03

Outperforms existing models on real-world datasets

Abstract

The emergence of generative models enables the creation of texts and images tailored to users' preferences. Existing personalized generative models have two critical limitations: lacking a dedicated paradigm for accurate preference modeling, and generating unimodal content despite real-world multimodal-driven user interactions. Therefore, we propose personalized multimodal generation, which captures modal-specific preferences via a dedicated preference model from multimodal interactions, and then feeds them into downstream generators for personalized multimodal content. However, this task presents two challenges: (1) Gap between continuous preferences from dedicated modeling and discrete token inputs intrinsic to generator architectures; (2) Potential inconsistency between generated images and texts. To tackle these, we present a two-stage framework called Discrete Preference learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.