EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs
Xiangyu Zhao, Bo Liu, Qijiong Liu, Guangyuan Shi, Xiao-Ming Wu

TL;DR
EasyGen introduces a novel multimodal generation framework combining BiDiffuser diffusion models and LLMs, achieving efficient, high-quality image and text generation with less training data and better modality interaction.
Contribution
It proposes EasyGen, a new model that leverages BiDiffuser and LLMs for more efficient multimodal understanding and generation, reducing data requirements and improving quality.
Findings
Outperforms existing models in data efficiency and generation quality
Demonstrates strong extendibility across modalities
Achieves high-quality image and text generation
Abstract
We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs), Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities,EasyGen leverages BiDiffuser,a bidirectional conditional diffusion model, to foster more efficient modality interactions. Easygen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space, Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAdapter · ALIGN · Diffusion · Contrastive Language-Image Pre-training
