GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts
Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng

TL;DR
This paper introduces a multi-modal vision-language model for designing aesthetically pleasing text logo layouts, leveraging large datasets and efficient processing techniques to outperform existing methods in layout quality and user preference.
Contribution
The paper presents a novel VLM-based framework for content-aware logo layout generation, introduces efficient glyph processing techniques, and constructs large datasets with detailed annotations for instruction tuning.
Findings
Outperforms existing methods on aesthetic and preference benchmarks
Reduces computational cost for multi-glyph processing
Enables complex design reasoning with natural language annotations
Abstract
Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, this specific task has received limited attention, often overshadowed by broader layout generation tasks such as document or poster design. In this paper, we propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts by integrating multi-modal inputs with user-defined constraints, enabling more flexible and robust layout generation for real-world applications. We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously, without compromising performance. To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets. In addition to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Digital Humanities and Scholarship
MethodsSoftmax · Attention Is All You Need
