MALeR: Improving Compositional Fidelity in Layout-Guided Generation

Shivank Saxena; Dhruv Srivastava; Makarand Tapaswi

arXiv:2511.06002·cs.CV·November 11, 2025

MALeR: Improving Compositional Fidelity in Layout-Guided Generation

Shivank Saxena, Dhruv Srivastava, Makarand Tapaswi

PDF

Open Access

TL;DR

MALeR enhances text-to-image generation by improving compositional fidelity, ensuring subjects stay within layouts, attributes are correctly bound, and complex scenes are accurately rendered.

Contribution

The paper introduces MALeR, a novel method that prevents subjects from outside-layout placement and reduces attribute leakage in compositional scene generation.

Findings

01

Superior compositional accuracy demonstrated

02

Improved attribute binding in complex scenes

03

Enhanced generation consistency

Abstract

Recent advances in text-to-image models have enabled a new era of creative and controllable image generation. However, generating compositional scenes with multiple subjects and attributes remains a significant challenge. To enhance user control over subject placement, several layout-guided methods have been proposed. However, these methods face numerous challenges, particularly in compositional scenes. Unintended subjects often appear outside the layouts, generated images can be out-of-distribution and contain unnatural artifacts, or attributes bleed across subjects, leading to incorrect visual outputs. In this work, we propose MALeR, a method that addresses each of these challenges. Given a text prompt and corresponding layouts, our method prevents subjects from appearing outside the given layouts while being in-distribution. Additionally, we propose a masked, attribute-aware binding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications