Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large   Language Models

Julian Perry; Frank Sanders; Carter Scott

arXiv:2502.00826·cs.CL·February 4, 2025

Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models

Julian Perry, Frank Sanders, Carter Scott

PDF

Open Access

TL;DR

This paper introduces a hybrid text-to-image generation method combining large language models with diffusion models, utilizing a dynamic KL-weighting strategy to enhance image quality, relevance, and training stability.

Contribution

It presents a novel dynamic KL-weighting technique and integrates semantic guidance from LLMs to improve diffusion-based image synthesis from text.

Findings

01

Outperforms traditional GANs in image quality and relevance

02

Enhances training stability and robustness to textual variability

03

Demonstrates scalability to other multimodal tasks

Abstract

In this paper, we presents a novel method for improving text-to-image generation by combining Large Language Models (LLMs) with diffusion models, a hybrid approach aimed at achieving both higher quality and efficiency in image synthesis from text descriptions. Our approach introduces a new dynamic KL-weighting strategy to optimize the diffusion process, along with incorporating semantic understanding from pre-trained LLMs to guide the generation process. The proposed method significantly improves both the visual quality and alignment of generated images with text descriptions, addressing challenges such as computational inefficiency, instability in training, and robustness to textual variability. We evaluate our method on the COCO dataset and demonstrate its superior performance over traditional GAN-based models, both quantitatively and qualitatively. Extensive experiments, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling