Aggregated Structural Representation with Large Language Models for Human-Centric Layout Generation

Jiongchao Jin; Shengchu Zhao; Dajun Chen; Wei Jiang; Yong Li

arXiv:2505.19554·cs.CV·May 27, 2025

Aggregated Structural Representation with Large Language Models for Human-Centric Layout Generation

Jiongchao Jin, Shengchu Zhao, Dajun Chen, Wei Jiang, Yong Li

PDF

TL;DR

This paper introduces an innovative approach combining graph networks and large language models to generate human-centric layouts efficiently, preserving structural integrity and enabling editable, diverse designs for mobile applications.

Contribution

The paper presents the Aggregation Structural Representation (ASR) module that integrates graph features with LLMs, replacing traditional vision modules for improved layout generation.

Findings

01

ASR outperforms existing methods on the RICO dataset in mIoU.

02

The approach enables human-editable, progressive layout design.

03

Sampling relational features yields diverse and creative layouts.

Abstract

Time consumption and the complexity of manual layout design make automated layout generation a critical task, especially for multiple applications across different mobile devices. Existing graph-based layout generation approaches suffer from limited generative capability, often resulting in unreasonable and incompatible outputs. Meanwhile, vision based generative models tend to overlook the original structural information, leading to component intersections and overlaps. To address these challenges, we propose an Aggregation Structural Representation (ASR) module that integrates graph networks with large language models (LLMs) to preserve structural information while enhancing generative capability. This novel pipeline utilizes graph features as hierarchical prior knowledge, replacing the traditional Vision Transformer (ViT) module in multimodal large language models (MLLM) to predict…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Vision Transformer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings