GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts

Junwen He; Yifan Wang; Lijun Wang; Huchuan Lu; Jun-Yan He; Chenyang Li; Hanyuan Chen; Jin-Peng Lan; Bin Luo; Yifeng Geng

arXiv:2411.11435·cs.CV·August 5, 2025

GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts

Junwen He, Yifan Wang, Lijun Wang, Huchuan Lu, Jun-Yan He, Chenyang Li, Hanyuan Chen, Jin-Peng Lan, Bin Luo, Yifeng Geng

PDF

Open Access

TL;DR

This paper introduces a multi-modal vision-language model for designing aesthetically pleasing text logo layouts, leveraging large datasets and efficient processing techniques to outperform existing methods in layout quality and user preference.

Contribution

The paper presents a novel VLM-based framework for content-aware logo layout generation, introduces efficient glyph processing techniques, and constructs large datasets with detailed annotations for instruction tuning.

Findings

01

Outperforms existing methods on aesthetic and preference benchmarks

02

Reduces computational cost for multi-glyph processing

03

Enables complex design reasoning with natural language annotations

Abstract

Text logo design heavily relies on the creativity and expertise of professional designers, in which arranging element layouts is one of the most important procedures. However, this specific task has received limited attention, often overshadowed by broader layout generation tasks such as document or poster design. In this paper, we propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts by integrating multi-modal inputs with user-defined constraints, enabling more flexible and robust layout generation for real-world applications. We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously, without compromising performance. To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets. In addition to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Digital Humanities and Scholarship

MethodsSoftmax · Attention Is All You Need