Multimodal Markup Document Models for Graphic Design Completion

Kotaro Kikuchi; Ukyo Honda; Naoto Inoue; Mayu Otani; Edgar Simo-Serra; Kota Yamaguchi

arXiv:2409.19051·cs.CV·December 5, 2025

Multimodal Markup Document Models for Graphic Design Completion

Kotaro Kikuchi, Ukyo Honda, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

PDF

Open Access 2 Models

TL;DR

This paper presents MarkupDM, a multimodal document model for graphic design that can complete, generate, and edit design elements by understanding both markup and images, advancing design automation.

Contribution

Introducing MarkupDM, a novel multimodal markup document model that unifies various design tasks through fill-in-the-middle training and supports image and text completion.

Findings

01

MarkupDM produces plausible design completions.

02

It outperforms state-of-the-art image editing models in instruction-guided tasks.

03

Demonstrates versatility across multiple design automation tasks.

Abstract

We introduce MarkupDM, a multimodal markup document model that represents graphic design as an interleaved multimodal document consisting of both markup language and images. Unlike existing holistic approaches that rely on an element-by-attribute grid representation, our representation accommodates variable-length elements, type-dependent attributes, and text content. Inspired by fill-in-the-middle training in code generation, we train the model to complete the missing part of a design document from its surrounding context, allowing it to treat various design tasks in a unified manner. Our model also supports image generation by predicting discrete image tokens through a specialized tokenizer with support for image transparency. We evaluate MarkupDM on three tasks, attribute value, image, and text completion, and demonstrate that it can produce plausible designs consistent with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Applications and Data Management