VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction

Jiahao Zhang; Ryota Yoshihashi; Shunsuke Kitada; Atsuki Osanai; Yuta; Nakashima

arXiv:2412.04237·cs.CV·March 12, 2025

VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction

Jiahao Zhang, Ryota Yoshihashi, Shunsuke Kitada, Atsuki Osanai, Yuta, Nakashima

PDF

Open Access

TL;DR

VASCAR introduces a training-free, visual-aware self-correction method for layout generation that leverages large vision-language models to iteratively refine outputs based on visual feedback, achieving state-of-the-art results.

Contribution

The paper presents VASCAR, a novel approach enabling LVLMs to improve layout generation through visual-aware self-correction without additional training.

Findings

01

VASCAR achieves state-of-the-art layout quality.

02

It effectively refines outputs using visual feedback.

03

Demonstrates versatility across different LVLMs.

Abstract

Large language models (LLMs) have proven effective for layout generation due to their ability to produce structure-description languages, such as HTML or JSON. In this paper, we argue that while LLMs can perform reasonably well in certain cases, their intrinsic limitation of not being able to perceive images restricts their effectiveness in tasks requiring visual content, e.g., content-aware layout generation. Therefore, we explore whether large vision-language models (LVLMs) can be applied to content-aware layout generation. To this end, inspired by the iterative revision and heuristic evaluation workflow of designers, we propose the training-free Visual-Aware Self-Correction LAyout GeneRation (VASCAR). VASCAR enables LVLMs (e.g., GPT-4o and Gemini) iteratively refine their outputs with reference to rendered layout images, which are visualized as colored bounding boxes on poster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization