EruDiff: Refactoring Knowledge in Diffusion Models for Advanced Text-to-Image Synthesis
Xiefan Guo, Xinzhu Ma, Haoxiang Ma, Zihao Zhou, Di Huang

TL;DR
EruDiff introduces a method to improve text-to-image diffusion models by refactoring their knowledge structures, enabling better handling of implicit prompts requiring deep world knowledge, and demonstrates significant performance gains.
Contribution
The paper proposes DK-DM and NO-RL strategies to align knowledge distributions and correct biases in diffusion models, enhancing their ability to process implicit prompts.
Findings
Significant performance improvements on Science-T2I and WISE benchmarks.
Enhanced handling of implicit prompts requiring deep world knowledge.
Effective knowledge refactoring in diffusion models demonstrated.
Abstract
Text-to-image diffusion models have achieved remarkable fidelity in synthesizing images from explicit text prompts, yet exhibit a critical deficiency in processing implicit prompts that require deep-level world knowledge, ranging from natural sciences to cultural commonsense, resulting in counter-factual synthesis. This paper traces the root of this limitation to a fundamental dislocation of the underlying knowledge structures, manifesting as a chaotic organization of implicit prompts compared to their explicit counterparts. In this paper, we propose EruDiff, which aims to refactor the knowledge within diffusion models. Specifically, we develop the Diffusion Knowledge Distribution Matching (DK-DM) to register the knowledge distribution of intractable implicit prompts with that of well-defined explicit anchors. Furthermore, to rectify the inherent biases in explicit prompt rendering, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Model Reduction and Neural Networks
