MCGM: Mask Conditional Text-to-Image Generative Model
Rami Skaik, Leonardo Rossi, Tomaso Fontanini, and Andrea Prati

TL;DR
This paper introduces MCGM, a novel conditional diffusion model that enables precise pose control in text-to-image generation, building upon and enhancing the Break-a-scene model with mask embedding for improved image quality and specificity.
Contribution
We propose MCGM, a new mask conditional diffusion model that allows detailed pose control in image generation, extending the capabilities of previous models like Break-a-scene.
Findings
MCGM generates high-quality images with specified poses.
The model effectively incorporates mask conditioning for better control.
Experimental results show improvements over baseline models.
Abstract
Recent advancements in generative models have revolutionized the field of artificial intelligence, enabling the creation of highly-realistic and detailed images. In this study, we propose a novel Mask Conditional Text-to-Image Generative Model (MCGM) that leverages the power of conditional diffusion models to generate pictures with specific poses. Our model builds upon the success of the Break-a-scene [1] model in generating new scenes using a single image with multiple subjects and incorporates a mask embedding injection that allows the conditioning of the generation process. By introducing this additional level of control, MCGM offers a flexible and intuitive approach for generating specific poses for one or more subjects learned from a single image, empowering users to influence the output based on their requirements. Through extensive experimentation and evaluation, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
MethodsDiffusion
