MALeR: Improving Compositional Fidelity in Layout-Guided Generation
Shivank Saxena, Dhruv Srivastava, Makarand Tapaswi

TL;DR
MALeR enhances text-to-image generation by improving compositional fidelity, ensuring subjects stay within layouts, attributes are correctly bound, and complex scenes are accurately rendered.
Contribution
The paper introduces MALeR, a novel method that prevents subjects from outside-layout placement and reduces attribute leakage in compositional scene generation.
Findings
Superior compositional accuracy demonstrated
Improved attribute binding in complex scenes
Enhanced generation consistency
Abstract
Recent advances in text-to-image models have enabled a new era of creative and controllable image generation. However, generating compositional scenes with multiple subjects and attributes remains a significant challenge. To enhance user control over subject placement, several layout-guided methods have been proposed. However, these methods face numerous challenges, particularly in compositional scenes. Unintended subjects often appear outside the layouts, generated images can be out-of-distribution and contain unnatural artifacts, or attributes bleed across subjects, leading to incorrect visual outputs. In this work, we propose MALeR, a method that addresses each of these challenges. Given a text prompt and corresponding layouts, our method prevents subjects from appearing outside the given layouts while being in-distribution. Additionally, we propose a masked, attribute-aware binding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications
