Di3PO - Diptych Diffusion DPO for Targeted Improvements in Image Generation
Sanjana Reddy (1), Ishaan Malhi (2), Sally Ma (2), Praneet Dutta (2) ((1) Google, (2) Google DeepMind)

TL;DR
Di3PO is a new method for preference tuning in text-to-image diffusion models that efficiently creates targeted positive and negative image pairs, improving training effectiveness for specific image regions.
Contribution
We introduce Di3PO, a novel approach to construct region-specific image pairs for preference tuning, enhancing targeted image generation improvements.
Findings
Outperforms baseline methods like SFT and DPO in text rendering tasks
Efficiently isolates regions for targeted improvements during training
Reduces variance and improves training efficiency in preference tuning
Abstract
Existing methods for preference tuning of text-to-image (T2I) diffusion models often rely on computationally expensive generation steps to create positive and negative pairs of images. These approaches frequently yield training pairs that either lack meaningful differences, are expensive to sample and filter, or exhibit significant variance in irrelevant pixel regions, thereby degrading training efficiency. To address these limitations, we introduce "Di3PO", a novel method for constructing positive and negative pairs that isolates specific regions targeted for improvement during preference tuning, while keeping the surrounding context in the image stable. We demonstrate the efficacy of our approach by applying it to the challenging task of text rendering in diffusion models, showcasing improvements over baseline methods of SFT and DPO.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Medical Image Segmentation Techniques
