Geometry Aligned Variational Transformer for Image-conditioned Layout Generation
Yunning Cao, Ye Ma, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge,, Yuning Jiang

TL;DR
This paper introduces ICVT, a novel transformer-based model that generates image-conditioned layouts by integrating visual and geometric information, improving aesthetic and semantic coherence in layout design.
Contribution
The paper proposes a new image-conditioned layout generation paradigm with a geometry alignment module and a variational transformer, enabling diverse and harmonious layout creation.
Findings
ICVT effectively models intra-layout relationships and fuses visual information.
The geometry alignment module improves the coherence between layout and image geometry.
Experimental results demonstrate superior layout quality and diversity.
Abstract
Layout generation is a novel task in computer vision, which combines the challenges in both object localization and aesthetic appraisal, widely used in advertisements, posters, and slides design. An accurate and pleasant layout should consider both the intra-domain relationship within layout elements and the inter-domain relationship between layout elements and the image. However, most previous methods simply focus on image-content-agnostic layout generation, without leveraging the complex visual information from the image. To this end, we explore a novel paradigm entitled image-conditioned layout generation, which aims to add text overlays to an image in a semantically coherent manner. Specifically, we propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing
