PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Jian Ma, Chen Chen, Qingsong Xie, Haonan Lu

TL;DR
This paper introduces PEA-Diffusion, a lightweight adapter trained via knowledge distillation that enables non-English text-to-image diffusion models to generate culturally relevant images without extensive retraining.
Contribution
It proposes a parameter-efficient adapter method that significantly improves non-English text-to-image generation using minimal additional parameters and knowledge distillation.
Findings
Freezing UNet still yields high performance on language-specific prompts.
PEA approaches English model performance on general prompts.
Adapter enhances cross-lingual text-to-image tasks.
Abstract
Text-to-image diffusion models are well-known for their ability to generate realistic images based on textual prompts. However, the existing works have predominantly focused on English, lacking support for non-English text-to-image models. The most commonly used translation methods cannot solve the generation problem related to language culture, while training from scratch on a specific language dataset is prohibitively expensive. In this paper, we are inspired to propose a simple plug-and-play language transfer method based on knowledge distillation. All we need to do is train a lightweight MLP-like parameter-efficient adapter (PEA) with only 6M parameters under teacher knowledge distillation along with a small parallel data corpus. We are surprised to find that freezing the parameters of UNet can still achieve remarkable performance on the language-specific prompt evaluation set,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
MethodsDiffusion · Adapter · Knowledge Distillation
