ReLayout: Integrating Relation Reasoning for Content-aware Layout Generation with Multi-modal Large Language Models
Jiaxu Tian, Xuehui Yu, Yaoxing Wang, Pan Wang, Guangqian Guo, Shan Gao

TL;DR
ReLayout is a novel method that improves content-aware layout generation by integrating relation reasoning and structured design concepts, resulting in more coherent, diverse, and aesthetically aligned layouts.
Contribution
It introduces explicit relation definitions and a prototype rebalance sampler to enhance the structural reasoning and style diversity in layout generation using large language models.
Findings
ReLayout outperforms existing methods in generating structured layouts.
The approach achieves layouts more aligned with human aesthetics.
It enhances explainability and diversity in layout generation.
Abstract
Content-aware layout aims to arrange design elements appropriately on a given canvas to convey information effectively. Recently, the trend for this task has been to leverage large language models (LLMs) to generate layouts automatically, achieving remarkable performance. However, existing LLM-based methods fail to adequately interpret spatial relationships among visual themes and design elements, leading to structural and diverse problems in layout generation. To address this issue, we introduce ReLayout, a novel method that leverages relation-CoT to generate more reasonable and aesthetically coherent layouts by fundamentally originating from design concepts. Specifically, we enhance layout annotations by introducing explicit relation definitions, such as region, salient, and margin between elements, with the goal of decomposing the layout into smaller, structured, and recursive…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper clearly demonstrates the problems (overlap, alignment errors, lack of diversity) in existing approaches and proposes a well-designed method to address the issues. The relation-CoT methodology is simple and effective, introducing minimal modification to the layout format while achieving noticeable performance improvements. In addition, the authors conduct extensive experiments, including quantitative, qualitative comparison, and user studies to show ReLayout's superior performance in al
- It is confusing that how ReLayout could preserve the aspect ratio of elements using the proposed methodology. For example, as shown in Figure 1, due to the error alignment of the text boxes, PosterLlama produces distorted elements. It is not clear how this issue is addressed by introducing layout relation-CoT. - When predicting the elements, its style contains both the margin attributes and the bounding box information. In such formulation, each element have 5 attributes, which inevitably caus
1. Content-aware layout generation is an important research problem to study. 2. The idea of explicitly modeling element relationships in content-aware layout generation is interesting and novel, which could inspire the layout generation community. 3. The augmented HTML-based layout representation with element relationship information is well designed, and the proposed sampling method is shown to be effective.
1. The quantitative results of the proposed method on PKU are not satisfactory. As shown in Table 1, on PKU, the proposed method is not very helpful for improving the performance of MLLMs on the content metrics, i.e., readability and occlusion. 2. The evaluation is not complete. First, this work represents element relationships in terms of hierarchical regions and element margins. Comparison with some alternative representations is needed, but is missing in the current paper. For example, one
1. The proposed method is a reasonable solution to improve the performance of existing MLLMs on the layout generation task. The experiments on two public datasets should show the improvements in terms of layout structure and diversity. 2. The proposed data construction and resampling mechanism produces additional annotation details on existing layout datasets, which could be useful for future research in the community.
1. There is a gap between the motivation and the proposed method. The motivation of this paper is inspired by the Chain-of-thought (CoT) that progressively obtains element relations step by step. However, CoT belongs to an inference-time technique, while the proposed method belongs to data transformation for training. I am not sure how the stated relation-CoT is used in MLLM training and inference. The structure-level understanding and the high-level layout design concepts (L82-L84) should be il
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Data Visualization and Analytics
