PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
Jiazhe Wei, Ken Li, Tianyu Lao, Haofan Wang, Liang Wang, Caifeng Shan, Chenyang Si

TL;DR
PosterCopilot is a novel framework that enhances layout reasoning and provides controllable, iterative editing capabilities for professional graphic design using advanced training strategies for multimodal models.
Contribution
It introduces a three-stage training strategy and a complete workflow enabling geometric understanding and layer-specific editing in graphic design LMMs.
Findings
Achieves geometrically accurate layouts
Provides aesthetically superior designs
Enables layer-controllable iterative editing
Abstract
Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances have explored automating this process using Large Multimodal Models (LMMs), yet existing methods often produce geometrically inaccurate layouts and lack the iterative, layer-specific editing required in professional workflows. To address these limitations, we present PosterCopilot, a framework that advances layout reasoning and controllable editing for professional graphic design. Specifically, we introduce a progressive three-stage training strategy that equips LMMs with geometric understanding and aesthetic reasoning for layout design, consisting of Perturbed Supervised Fine-Tuning, Reinforcement Learning for Visual-Reality Alignment, and Reinforcement Learning from Aesthetic Feedback. Furthermore, we develop a complete workflow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · 3D Shape Modeling and Analysis · Data Visualization and Analytics
